java 为什么 Apache Commons 会考虑 '???' 数字？

Question

提问by Hannes

According to Apache Commons Lang's documentation for StringUtils.isNumeric(), the String '???' is numeric.

根据 Apache Commons Lang 的文档StringUtils.isNumeric()，字符串 '???' 是数字。

Since I believed this might be a mistake in the documentation, I ran tests to verify the statement. I found that according to Apache Commons it isnumeric.

由于我认为这可能是文档中的错误，因此我运行了测试来验证该语句。我发现根据 Apache Commons 它是数字。

Why is this String numeric? What do those characters represent?

为什么这个字符串是数字？这些字符代表什么？

Answer 1

回答by Andy Turner

Because that "CharSequence contains only Unicode digits" (quoting your linked documentation).

因为“CharSequence 仅包含 Unicode 数字”（引用您的链接文档）。

All of the characters return true for Character.isDigit:

对于，所有字符都返回 true Character.isDigit：

Some Unicode character ranges that contain digits:
'\u0030' through '\u0039', ISO-LATIN-1 digits ('0' through '9')
'\u0660' through '\u0669', Arabic-Indic digits
'\u06F0' through '\u06F9', Extended Arabic-Indic digits
'\u0966' through '\u096F', Devanagari digits
'\uFF10' through '\uFF19', Fullwidth digits
Many other character ranges contain digits as well.

一些包含数字的 Unicode 字符范围：
'\u0030' 到 '\u0039'，ISO-LATIN-1 数字（'0' 到 '9'）
'\u0660' 到 '\u0669'，阿拉伯-印度数字
'\u06F0' 到 '\u06F9'，扩展的阿拉伯-印度数字
'\u0966' 到 '\u096F'，梵文数字
'\uFF10' 到 '\uFF19'，全角数字
许多其他字符范围也包含数字。

???are Devanagari digits:

???是梵文数字：

Answer 2

回答by ΦXoc? ? Пepeúpa ツ

The symbol ???is the same as 123 for the Nepali language or any other language using the Devanagari scriptsuch as Hindi, Gujarati, and so on, and is therefore is a number for Apache Commons.

符号？？？与尼泊尔语或任何其他使用梵文文字的语言（如印地语、古吉拉特语等）的 123 相同，因此是 Apache Commons 的数字。

Answer 3

回答by Maroun

You can use Character#getTypeto check the character's general category:

您可以使用Character#getType来检查角色的一般类别：

System.out.println(Character.DECIMAL_DIGIT_NUMBER == Character.getType('?'));

This will print true, which is an "evidence" that '?' is a digit number.

这将打印true，这是“？”的“证据”。是一个数字。

Now let's examine the unicode value of the '?' character:

现在让我们检查 '?' 的 unicode 值。特点：

System.out.println(Integer.toHexString('?'));
// 967

This number is on the range of Devanagari digits- which is: \u0966through \u096F.

这个数字在梵文数字的范围内- 即：\u0966到\u096F。

Also try:

还可以尝试：

Character.UnicodeBlock block = Character.UnicodeBlock.of('?');
System.out.println(block.toString());
// DEVANAGARI

Devanagariis:

梵文是：

is an abugida (alphasyllabary) alphabet of India and Nepal

是印度和尼泊尔的 abugida (alphasyllabary) 字母表

"???" is a "123" (Basic Latin unicode).

“？？？” 是“123”（基本拉丁语 unicode）。

Reading:

阅读：

Answer 4

回答by Solomon Rutzky

If you ever want to know what properties a particular "character" has (and there are quite a few), go directly to the source: Unicode.org. They have research tools that can show you most anything you would care to know.

如果您想知道特定“字符”具有哪些属性（并且有很多），请直接访问源代码：Unicode.org。他们有研究工具，可以向您展示您想知道的大部分内容。

If you want to see all of the properties of a specific character, try the following:
http://unicode.org/cldr/utility/character.jsp?a=?
or:
http://unicode.org/cldr/utility/character.jsp?a=%E0%A5%A7
If you want to see all characters classified as "decimal digits" (i.e. with number values of 0 through 9), try the following:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Numeric_Type=Decimal:]
^{( 550 Code Points -- currently / as of Unicode 9.0 )}
If you want to see all characters classified as "non-decimal digit numbers" (i.e. fractions, circled, etc), try the following:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Numeric_Type=Numeric:]
^{( 836 Code Points -- currently / as of Unicode 9.0 )}
If you want to see all characters classified as "decimal digits" (i.e. with number values of 0 through 9), but only up through Unicode 6.0 (which .NET uses), try the following:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Numeric_Type=Decimal:]%26[:Age=6.0:]
^{( 420 Code Points -- and shouldn't change )}
If you want to see all characters classified as "decimal digits" (i.e. with number values of 0 through 9), but only up through Unicode 6.0 (which .NET uses), and only in the Base-Multilingual Plane / no Supplementary Characters (i.e. nothing above Code Point 65535 / U+0xFFFF), try the following:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Numeric_Type=Decimal:]%26[:Age=6.0:]%26[:bmp=Yes:]
^{( 350 Code Points -- and shouldn't change )}

如果要查看特定字符的所有属性，请尝试以下操作：
http://unicode.org/cldr/utility/character.jsp?a=?
或者：
http://unicode.org/cldr/utility/character.jsp?a=%E0%A5%A7
如果您想查看归类为“十进制数字”（即数字值为 0 到 9）的所有字符，请尝试以下操作：
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Numeric_Type=Decimal:]（550 个
^{代码点——目前/从 Unicode 9.0 开始）}
如果您想查看所有归类为“非十进制数字”（即分数、圆圈等）的字符，请尝试以下操作：
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Numeric_Type=Numeric:]（836 个
^{代码点——目前/从 Unicode 9.0 开始）}
如果您想查看归类为“十进制数字”（即数字值为 0 到 9）的所有字符，但仅限于 Unicode 6.0（.NET 使用），请尝试以下操作：
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Numeric_Type=Decimal:]%26[:Age=6.0:]（420 个
^{代码点——不应改变）}
如果您想查看所有归类为“十进制数字”的字符（即数字值为 0 到 9），但仅限于 Unicode 6.0（.NET 使用），并且仅在基本多语言平面/无补充字符（即没有高于代码点 65535 / U+0xFFFF)，请尝试以下操作：
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Numeric_Type=Decimal:]%26[:Age=6.0:]%26[:bmp=Yes:]（350
^{个代码点 - - 并且不应该改变）}

KEEP IN MIND:The Unicode Consortium produces a specification, not software. This means that it is up to each software vendor to implement the specification as accurately as they can. So just like HTML, JavaScript, CSS, SQL, etc, there is variation between different platforms, languages, and so on. For example, I found a bug in Microsoft's .NET Framework whereby circled Latin letters A-Zand a-z-- Code Points 0x24B6 through 0x24E9 -- do not properly register as being char.IsLetter = true(bug report here). And that leads to unexpected behavior in related functionality, such as when calling the TextInfo.ToTitleCase()method (bug report here).

请记住：Unicode 联盟制定的是规范，而不是软件。这意味着每个软件供应商都需要尽可能准确地实施规范。因此，就像 HTML、JavaScript、CSS、SQL 等一样，不同平台、语言等之间存在差异。例如，我在 Microsoft 的 .NET Framework 中发现了一个错误，其中圈出的拉丁字母A-Z和 a-z- 代码点 0x24B6 到 0x24E9 - 没有正确注册为char.IsLetter = true（此处为错误报告）。这会导致相关功能出现意外行为，例如在调用TextInfo.ToTitleCase()方法时（此处为错误报告）。

Answer 5

回答by Nayan Katkani

Symbols '???' are actually derived from Hindi language(Basically from Sanskrit language i.e Devanagiri) which represent numeric values just like:

符号'???' 实际上源自印地语（基本上来自梵语，即梵文），它们表示数值就像：

? represent 1

? 代表 1

? represent 2

? 代表 2

and like wise

和聪明的一样

java 为什么 Apache Commons 会考虑 '???' 数字？

提问by Hannes

回答by Andy Turner

回答by ΦXoc? ? Пepeúpa ツ

回答by Maroun

回答by Solomon Rutzky

回答by Nayan Katkani

相关推荐

最近更新

标签

java 为什么 Apache Commons 会考虑 '???' 数字？

提问by Hannes

回答by Andy Turner

回答by ΦXoc? ? Пepeúpa ツ

回答by Maroun

回答by Solomon Rutzky

回答by Nayan Katkani

相关推荐

java Spring Data JPA/Boot: findBy ... 或

java 简单的Java停车场管理系统

java 自定义 JavaFX 对话框

java 如何在 Kotlin 中将 intArray 转换为 ArrayList<Int>？

相关推荐

最近更新

标签