理解 Java 中的正则表达式:split("\t") vs split("\\t") - 它们什么时候工作,什么时候应该使用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3762347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Understanding regex in Java: split("\t") vs split("\\t") - when do they both work, and when should they be used
提问by posdef
I have recently figured out that I haven't been using regex properly in my code. Given the example of a tab delimited string str
, I have been using str.split("\t")
. Now I realize that this is wrong and to match the tabs properly I should use str.split("\\t")
.
我最近发现我没有在我的代码中正确使用正则表达式。以制表符分隔的字符串为例str
,我一直在使用str.split("\t")
. 现在我意识到这是错误的,为了正确匹配标签,我应该使用str.split("\\t")
.
However I happen to stumble upon this fact by pure chance, as I was looking for regex patterns for something else. You see, the faulty code split("\t")
has been working quite fine in my case, and now I am confused as to why it does work if it's the wrong way to declare a regex for matching the tab character. Hence the question, for the sake of actually understanding how regex is handled in Java, instead of just copying the code into Eclipse and not really caring why it works...
然而,我碰巧偶然发现了这个事实,因为我正在寻找其他东西的正则表达式模式。你看,有问题的代码split("\t")
在我的情况下一直工作得很好,现在我很困惑,如果声明一个正则表达式来匹配制表符的方式是错误的,它为什么会起作用。因此,问题是为了真正了解正则表达式在 Java 中是如何处理的,而不仅仅是将代码复制到 Eclipse 中而不是真正关心它为什么工作......
In a similar fashion I have come upon a piece of text which is not only tab-delimited but also comma delimited. More clearly put, the tab-delimited lists I am parsing sometimes include "compound" items which look like: item1,item2,item3
and I would like to parse them as separate elements, for the sake of simplicity. In that case the appropriate regex expression should be: line.split("[\\t,]")
, or am I mistaken here as well??
以类似的方式,我遇到了一段文本,它不仅以制表符分隔,而且以逗号分隔。更清楚地说,我正在解析的制表符分隔列表有时包括“复合”项目,它们看起来像:item1,item2,item3
为了简单起见,我想将它们解析为单独的元素。在这种情况下,适当的正则表达式应该是: line.split("[\\t,]")
,或者我在这里也弄错了??
Thanks in advance,
提前致谢,
采纳答案by Gumbo
When using "\t"
, the escape sequence\t
is replaced by Java with the character U+0009. When using "\\t"
, the escape sequence \\
in \\t
is replaced by Java with \
, resulting in \t
that is then interpreted by the regular expressionparser as the character U+0009.
使用 时"\t"
,转义序列\t
由带有字符 U+0009 的 Java 替换。使用时"\\t"
,转义序列\\
in\\t
被 Java 替换为\
,导致正则表达式解析器将其\t
解释为字符 U+0009。
So both notations will be interpreted correctly. It's just the question when it is replaced with the corresponding character.
所以这两种符号都会被正确解释。只是什么时候换成对应的字符的问题。