在 Java String.split() 方法中处理带有转义字符的分隔符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18677762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 10:05:53  来源:igfitidea点击:

Handling delimiter with escape characters in Java String.split() method

javaregex

提问by user2757740

I have searched the web for my query, but didn't get the answer which fits my requirement exactly. I have my string like below:

我在网上搜索了我的查询,但没有得到完全符合我要求的答案。我的字符串如下所示:

A|B|C|The Steading\|Keir Allan\|Braco|E

My Output should look like below:

我的输出应如下所示:

A
B
C
The Steading|Keir Allan|Braco
E

My requirement is to skip the delimiter if it is preceded by the escape sequence. I have tried the following using negative lookbehinds in String.split():

我的要求是跳过分隔符,如果它前面有转义序列。我尝试了以下使用负向后String.split()查看:

(?<!\)\|

But, my problem is the delimiter will be defined by the end user dynamically and it need not be always |. It can be any character on the keyboard (no restrictions). Hence, my doubt is that the above regex might fail for some of the special characters which are not allowed in regex.

但是,我的问题是定界符将由最终用户动态定义,它不必总是|. 它可以是键盘上的任何字符(没有限制)。因此,我怀疑上述正则表达式对于某些正则表达式中不允许的特殊字符可能会失败。

I just wanted to know if this is the perfect way to do it.

我只是想知道这是否是完美的方法。

采纳答案by arshajii

You can use Pattern.quote():

您可以使用Pattern.quote()

String regex = "(?<!\\)" + Pattern.quote(delim);


Using your example:

使用您的示例:

String delim = "|";
String regex = "(?<!\\)" + Pattern.quote(delim);

for (String s : "A|B|C|The Steading\|Keir Allan\|Braco|E".split(regex))
    System.out.println(s);
A
B
C
The Steading\|Keir Allan\|Braco
E


You can extend this to use a custom escape sequence as well:

您也可以扩展它以使用自定义转义序列:

String delim = "|";
String esc = "+";
String regex = "(?<!" + Pattern.quote(esc) + ")" + Pattern.quote(delim);

for (String s : "A|B|C|The Steading+|Keir Allan+|Braco|E".split(regex))
    System.out.println(s);
A
B
C
The Steading+|Keir Allan+|Braco
E

回答by Jan Cetkovsky

I know this is an old thread, but the lookbehind solution has an issue, that it doesn't allow escaping of the escape character (the split would not occur on A|B|C|The Steading\\|Keir Allan\|Braco|E)).

我知道这是一个旧线程,但是lookbehind 解决方案有一个问题,它不允许转义转义字符(拆分不会发生在A|B|C|The Steading\\|Keir Allan\|Braco|E))。

The positive matching solution in thread Regex and escaped and unescaped delimiterworks better (with modification using Pattern.quote()if the delimiter is dynamic).

线程正则表达式和转义和非转义分隔符中的正匹配解决方案效果更好(Pattern.quote()如果分隔符是动态的,则使用修改)。