如何将 long 与 Java regex 匹配?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11243204/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to match a long with Java regex?
提问by Sebastien Lorber
I know i can match numbers with Pattern.compile("\\d*");
我知道我可以匹配数字 Pattern.compile("\\d*");
But it doesn't handle the long min/max values.
但它不处理长的最小值/最大值。
For performence issues related to exceptions i do not want to try to parse the long unless it is really a long.
对于与异常相关的性能问题,我不想尝试解析 long,除非它真的很长。
if ( LONG_PATTERN.matcher(timestampStr).matches() ) {
long timeStamp = Long.parseLong(timestampStr);
return new Date(timeStamp);
} else {
LOGGER.error("Can't convert " + timestampStr + " to a Date because it is not a timestamp! -> ");
return null;
}
I mean i do not want any try/catch block and i do not want to get exceptions raised for a long like "564654954654464654654567879865132154778" which is out of the size of a regular Java long.
我的意思是我不想要任何 try/catch 块,并且我不想长时间引发异常,例如“564654954654464654654567879865132154778”,这超出了常规 Java 的大小。
Does someone has a pattern to handle this kind of need for the primitive java types? Does the JDK provide something to handle it automatically? Is there a fail-safe parsing mecanism in Java?
有人有一种模式来处理对原始 Java 类型的这种需求吗?JDK 是否提供了自动处理它的功能?Java 中是否有故障安全的解析机制?
Thanks
谢谢
Edit:Please assume that the "bad long string" is not an exceptionnal case. I'm not asking for a benchmark, i'm here for a regex representing a long and nothing more. I'm aware of the additionnal time required by the regex check, but at least my long parsing will always be constant and never be dependent of the % of "bad long strings"
编辑:请假设“坏长字符串”不是特例。我不是在要求一个基准,我在这里是为了一个代表 long 的正则表达式,仅此而已。我知道正则表达式检查所需的额外时间,但至少我的长解析将始终保持不变并且永远不会依赖于“坏长字符串”的百分比
I can't find the link again but there is a nice parsing benchmark on StackOverflow which clearly shows that reusing the sams compiled regex is really fast, a LOT faster than throwing an exception, thus only a small threshold of exceptions whould make the system slower than with the additionnal regex check.
我找不到链接了,但是 StackOverflow 上有一个很好的解析基准,它清楚地表明重用 sams 编译的正则表达式真的很快,比抛出异常快很多,因此只有很小的异常阈值会使系统变慢与额外的正则表达式检查相比。
回答by T.J. Crowder
The minimum avlue of a long
is -9,223,372,036,854,775,808
, and the maximum value is 9,223,372,036,854,775,807
. So, a maximum of 19 digits. So, \d{1,19}
should get you there, perhaps with an optional -
, and with ^
and $
to match the ends of the string.
a 的最小值long
为-9,223,372,036,854,775,808
,最大值为9,223,372,036,854,775,807
。因此,最多 19 位数字。所以,\d{1,19}
应该让你在那里,也许有一个可选的-
,并与^
和$
相匹配的字符串的结束。
So roughly:
所以大致:
Pattern LONG_PATTERN = Pattern.compile("^-?\d{1,19}$");
...or something along those lines, and assuming you don't allow commas (or have already removed them).
...或类似的东西,并假设您不允许使用逗号(或已将其删除)。
As gexicide points out in the comments, the above allows a small (in comparison) range of invalid values, such as 9,999,999,999,999,999,999
. You can get more complex with your regex, or just accept that the above will weed out the vast majority of invalid numbers and so you reduce the number of parsing exceptions you get.
正如 gexicide 在评论中指出的那样,上面允许一个小的(比较)范围的无效值,例如9,999,999,999,999,999,999
. 您可以使用正则表达式变得更复杂,或者只是接受上述内容会清除绝大多数无效数字,从而减少您获得的解析异常的数量。
回答by Aliaksei Mychko
This regular expression should do what you need:
这个正则表达式应该做你需要的:
^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$
^(-9223372036854775808|0)$|^((-?)((?!0)\d{1,18}|[1-8]\d{18}|9[0-1]\d{17}|92[0-1]\d{16}|922[0-2]\d{15}|9223[0-2]\d{14}|92233[0-6]\d{13}|922337[0-1]\d{12}|92233720[0-2]\d{10}|922337203[0-5]\d{9}|9223372036[0-7]\d{8}|92233720368[0-4]\d{7}|922337203685[0-3]\d{6}|9223372036854[0-6]\d{5}|92233720368547[0-6]\d{4}|922337203685477[0-4]\d{3}|9223372036854775[0-7]\d{2}|922337203685477580[0-7]))$
But this regexp doesn't validate additional symbols like +
, L
, _
and etc. And if you need to validate all possible Long values you need to upgrade this regexp.
但是这个正则表达式不会验证像+
、L
、_
等其他符号。如果你需要验证所有可能的 Long 值,你需要升级这个正则表达式。
回答by gexicide
Simply catch the NumberFormatException, unless this case happens very often.
只需捕获 NumberFormatException,除非这种情况经常发生。
Another way would be to use a pattern which only allows long literals. Such pattern might be quite complex.
另一种方法是使用只允许长文字的模式。这种模式可能非常复杂。
A third way would be to parse the number as BigInt first. Then you can compare it to Long.MAX_VALUE and Long.MIN_VALUE to check whether it is in the bounds of long. However, this might be costly as well.
第三种方法是首先将数字解析为 BigInt。然后你可以将它与 Long.MAX_VALUE 和 Long.MIN_VALUE 进行比较,以检查它是否在 long 的范围内。然而,这也可能是昂贵的。
Also note: Parsing the long is quite fast, it is a very optimized method (that, for example, tries to parse two digits in one step). Applying pattern matching might be even more costly than performing the parsing. The only thing which is slow about the parsing is throwing the NumberFormatException. Thus, simply catching the exception is the best way to go if the exceptional case does not happen too often
另请注意:解析 long 非常快,这是一种非常优化的方法(例如,尝试一步解析两个数字)。应用模式匹配可能比执行解析的成本更高。唯一缓慢的解析是抛出 NumberFormatException。因此,如果异常情况不经常发生,那么简单地捕获异常是最好的方法