java 为什么 Apache Commons 会考虑 '???' 数字?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40148683/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 04:57:46  来源:igfitidea点击:

Why does Apache Commons consider '???' numeric?

javaunicodenumber-systemsapache-commons-lang3

提问by Hannes

According to Apache Commons Lang's documentation for StringUtils.isNumeric(), the String '???' is numeric.

根据 Apache Commons Lang 的文档StringUtils.isNumeric(),字符串 '???' 是数字。

Since I believed this might be a mistake in the documentation, I ran tests to verify the statement. I found that according to Apache Commons it isnumeric.

由于我认为这可能是文档中的错误,因此我运行了测试来验证该语句。我发现根据 Apache Commons 它数字。

Why is this String numeric? What do those characters represent?

为什么这个字符串是数字?这些字符代表什么?

回答by Andy Turner

Because that "CharSequence contains only Unicode digits" (quoting your linked documentation).

因为“CharSequence 仅包含 Unicode 数字”(引用您的链接文档)。

All of the characters return true for Character.isDigit:

对于 ,所有字符都返回 true Character.isDigit

Some Unicode character ranges that contain digits:

  • '\u0030' through '\u0039', ISO-LATIN-1 digits ('0' through '9')
  • '\u0660' through '\u0669', Arabic-Indic digits
  • '\u06F0' through '\u06F9', Extended Arabic-Indic digits
  • '\u0966' through '\u096F', Devanagari digits
  • '\uFF10' through '\uFF19', Fullwidth digits

Many other character ranges contain digits as well.

一些包含数字的 Unicode 字符范围:

  • '\u0030' 到 '\u0039',ISO-LATIN-1 数字('0' 到 '9')
  • '\u0660' 到 '\u0669',阿拉伯-印度数字
  • '\u06F0' 到 '\u06F9',扩展的阿拉伯-印度数字
  • '\u0966' 到 '\u096F',梵文数字
  • '\uFF10' 到 '\uFF19',全角数字

许多其他字符范围也包含数字。

???are Devanagari digits:

???是梵文数字:

回答by ΦXoc? ? Пepeúpa ツ

The symbol ???is the same as 123 for the Nepali language or any other language using the Devanagari scriptsuch as Hindi, Gujarati, and so on, and is therefore is a number for Apache Commons.

符号???与尼泊尔语或任何其他使用梵文文字的语言(如印地语、古吉拉特语等)的 123 相同,因此是 Apache Commons 的数字。

回答by Maroun

You can use Character#getTypeto check the character's general category:

您可以使用Character#getType来检查角色的一般类别:

System.out.println(Character.DECIMAL_DIGIT_NUMBER == Character.getType('?'));

This will print true, which is an "evidence" that '?' is a digit number.

这将打印true,这是“?”的“证据”。是一个数字

Now let's examine the unicode value of the '?' character:

现在让我们检查 '?' 的 unicode 值。特点:

System.out.println(Integer.toHexString('?'));
// 967

This number is on the range of Devanagari digits- which is: \u0966through \u096F.

这个数字在梵文数字的范围内- 即:\u0966\u096F

Also try:

还可以尝试:

Character.UnicodeBlock block = Character.UnicodeBlock.of('?');
System.out.println(block.toString());
// DEVANAGARI

Devanagariis:

梵文是:

is an abugida (alphasyllabary) alphabet of India and Nepal

是印度和尼泊尔的 abugida (alphasyllabary) 字母表

"???" is a "123" (Basic Latin unicode).

“???” 是“123”(基本拉丁语 unicode)。

Reading:

阅读:

回答by Solomon Rutzky

If you ever want to know what properties a particular "character" has (and there are quite a few), go directly to the source: Unicode.org. They have research tools that can show you most anything you would care to know.

如果您想知道特定“字符”具有哪些属性(并且有很多),请直接访问源代码:Unicode.org。他们有研究工具,可以向您展示您想知道的大部分内容。

KEEP IN MIND:The Unicode Consortium produces a specification, not software. This means that it is up to each software vendor to implement the specification as accurately as they can. So just like HTML, JavaScript, CSS, SQL, etc, there is variation between different platforms, languages, and so on. For example, I found a bug in Microsoft's .NET Framework whereby circled Latin letters A-Zand a-z-- Code Points 0x24B6 through 0x24E9 -- do not properly register as being char.IsLetter = true(bug report here). And that leads to unexpected behavior in related functionality, such as when calling the TextInfo.ToTitleCase()method (bug report here).

请记住:Unicode 联盟制定的是规范,而不是软件。这意味着每个软件供应商都需要尽可能准确地实施规范。因此,就像 HTML、JavaScript、CSS、SQL 等一样,不同平台、语言等之间存在差异。例如,我在 Microsoft 的 .NET Framework 中发现了一个错误,其中圈出的拉丁字母A-Za-z- 代码点 0x24B6 到 0x24E9 - 没有正确注册为char.IsLetter = true此处错误报告)。这会导致相关功能出现意外行为,例如在调用TextInfo.ToTitleCase()方法时(此处为错误报告)。

回答by Nayan Katkani

Symbols '???' are actually derived from Hindi language(Basically from Sanskrit language i.e Devanagiri) which represent numeric values just like:

符号'???' 实际上源自印地语(基本上来自梵语,即梵文),它们表示数值就像:

? represent 1

? 代表 1

? represent 2

? 代表 2

and like wise

和聪明的一样