理解 Java 中的正则表达式：split("\t") vs split("\\t") - 它们什么时候工作，什么时候应该使用

Question

提问by posdef

I have recently figured out that I haven't been using regex properly in my code. Given the example of a tab delimited string str, I have been using str.split("\t"). Now I realize that this is wrong and to match the tabs properly I should use str.split("\\t").

我最近发现我没有在我的代码中正确使用正则表达式。以制表符分隔的字符串为例str，我一直在使用str.split("\t"). 现在我意识到这是错误的，为了正确匹配标签，我应该使用str.split("\\t").

However I happen to stumble upon this fact by pure chance, as I was looking for regex patterns for something else. You see, the faulty code split("\t")has been working quite fine in my case, and now I am confused as to why it does work if it's the wrong way to declare a regex for matching the tab character. Hence the question, for the sake of actually understanding how regex is handled in Java, instead of just copying the code into Eclipse and not really caring why it works...

然而，我碰巧偶然发现了这个事实，因为我正在寻找其他东西的正则表达式模式。你看，有问题的代码split("\t")在我的情况下一直工作得很好，现在我很困惑，如果声明一个正则表达式来匹配制表符的方式是错误的，它为什么会起作用。因此，问题是为了真正了解正则表达式在 Java 中是如何处理的，而不仅仅是将代码复制到 Eclipse 中而不是真正关心它为什么工作......

In a similar fashion I have come upon a piece of text which is not only tab-delimited but also comma delimited. More clearly put, the tab-delimited lists I am parsing sometimes include "compound" items which look like: item1,item2,item3and I would like to parse them as separate elements, for the sake of simplicity. In that case the appropriate regex expression should be: line.split("[\\t,]"), or am I mistaken here as well??

以类似的方式，我遇到了一段文本，它不仅以制表符分隔，而且以逗号分隔。更清楚地说，我正在解析的制表符分隔列表有时包括“复合”项目，它们看起来像：item1,item2,item3为了简单起见，我想将它们解析为单独的元素。在这种情况下，适当的正则表达式应该是: line.split("[\\t,]")，或者我在这里也弄错了？？

Thanks in advance,

提前致谢，

Answer 1

采纳答案by Gumbo

When using "\t", the escape sequence\tis replaced by Java with the character U+0009. When using "\\t", the escape sequence \\in \\tis replaced by Java with \, resulting in \tthat is then interpreted by the regular expressionparser as the character U+0009.

使用时"\t"，转义序列\t由带有字符 U+0009 的 Java 替换。使用时"\\t"，转义序列\\in\\t被 Java 替换为\，导致正则表达式解析器将其\t解释为字符 U+0009。

So both notations will be interpreted correctly. It's just the question when it is replaced with the corresponding character.

所以这两种符号都会被正确解释。只是什么时候换成对应的字符的问题。

Answer 2

回答by Jaydeep Patel

\is consider to be escape char in java, so to get correct regex you need to escape \with \and t to indicate tab.

\在 Java 中被认为是转义字符，因此要获得正确的正则表达式，您需要\使用\和 t转义以指示制表符。

Thistutorial will help more

本教程将帮助更多

理解 Java 中的正则表达式：split("\t") vs split("\\t") - 它们什么时候工作，什么时候应该使用

提问by posdef

采纳答案by Gumbo

回答by Jaydeep Patel

相关推荐

最近更新

标签

理解 Java 中的正则表达式：split("\t") vs split("\\t") - 它们什么时候工作，什么时候应该使用

提问by posdef

采纳答案by Gumbo

回答by Jaydeep Patel

相关推荐

如何在java中添加任意长度的两个数字？

Java 在同一个包中找不到类

当我从命令行启动 Java 应用程序时，我可以防止数字签名警告吗？

Java 在 html 表中输出 JSON 数组（一个 jsp 页面）

相关推荐

最近更新

标签