Java 如何在忽略转义逗号的同时拆分逗号分隔的字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/820172/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split a comma separated String while ignoring escaped commas?
提问by arturh
I need to write a extended version of the StringUtils.commaDelimitedListToStringArray function which gets an additional parameter: the escape char.
我需要编写 StringUtils.commaDelimitedListToStringArray 函数的扩展版本,它获取一个附加参数:转义字符。
so calling my:
所以打电话给我:
commaDelimitedListToStringArray("test,test\,test\,test,test", "\")
should return:
应该返回:
["test", "test,test,test", "test"]
My current attempt is to use String.split() to split the String using regular expressions:
我目前的尝试是使用 String.split() 使用正则表达式拆分字符串:
String[] array = str.split("[^\\],");
But the returned array is:
但返回的数组是:
["tes", "test\,test\,tes", "test"]
Any ideas?
有任何想法吗?
采纳答案by matt b
The regular expression
正则表达式
[^\],
means "match a character which is not a backslash followed by a comma" - this is why patterns such as t,
are matching, because t
is a character which is not a backslash.
意思是“匹配一个不是反斜杠后跟逗号的字符”——这就是为什么这样的模式t,
匹配,因为t
是一个不是反斜杠的字符。
I think you need to use some sort of negative lookbehind, to capture a ,
which is not preceded by a \
without capturing the preceding character, something like
我认为您需要使用某种否定的lookbehind来捕获,
前面\
没有a而不捕获前面的字符的 a ,例如
(?<!\),
(BTW, note that I have purposefully not doubly-escaped the backslashes to make this more readable)
(顺便说一句,请注意,我故意没有双重转义反斜杠以使其更具可读性)
回答by cletus
Try:
尝试:
String array[] = str.split("(?<!\\),");
Basically this is saying split on a comma, except where that comma is preceded by two backslashes. This is called a negative lookbehind zero-width assertion.
基本上这就是用逗号分隔,除非逗号前面有两个反斜杠。这称为负后视零宽度断言。
回答by arturh
For future reference, here is the complete method i ended up with:
为了将来参考,这是我最终得到的完整方法:
public static String[] commaDelimitedListToStringArray(String str, String escapeChar) {
// these characters need to be escaped in a regular expression
String regularExpressionSpecialChars = "/.*+?|()[]{}\";
String escapedEscapeChar = escapeChar;
// if the escape char for our comma separated list needs to be escaped
// for the regular expression, escape it using the \ char
if(regularExpressionSpecialChars.indexOf(escapeChar) != -1)
escapedEscapeChar = "\" + escapeChar;
// see http://stackoverflow.com/questions/820172/how-to-split-a-comma-separated-string-while-ignoring-escaped-commas
String[] temp = str.split("(?<!" + escapedEscapeChar + "),", -1);
// remove the escapeChar for the end result
String[] result = new String[temp.length];
for(int i=0; i<temp.length; i++) {
result[i] = temp[i].replaceAll(escapedEscapeChar + ",", ",");
}
return result;
}
回答by boumbh
As matt b said, [^\\],
will interpret the character preceding the comma as a part of the delimiter.
正如 matt b 所说,[^\\],
将逗号前面的字符解释为分隔符的一部分。
"test\\\,test\\,test\,test,test"
-(split)->
["test\\\,test\\,test\,tes" , "test"]
As drvdijk said, (?<!\\),
will misinterpret escaped backslashes.
正如 drvdijk 所说,(?<!\\),
会误解转义的反斜杠。
"test\\\,test\\,test\,test,test"
-(split)->
["test\\\,test\\,test\,test" , "test"]
-(unescape commas)->
["test\\,test\,test,test" , "test"]
I would expect being able to escape backslashes as well...
我希望也能够逃脱反斜杠......
"test\\\,test\\,test\,test,test"
-(split)->
["test\\\,test\\" , "test\,test" , "test"]
-(unescape commas and backslashes)->
["test\,test\" , "test,test" , "test"]
drvdijk suggested (?<=(?<!\\\\)(\\\\\\\\){0,100}),
which works well for lists with elements ending with up to 100 backslashes. This is far enough... but why a limit? Is there a more efficient way (isn't lookbehind greedy)? What about invalid strings?
drvdijk 建议(?<=(?<!\\\\)(\\\\\\\\){0,100}),
对于以最多 100 个反斜杠结尾的元素的列表来说,这很有效。这已经足够了......但为什么要限制?有没有更有效的方法(不是lookbehind greedy)吗?无效字符串怎么办?
I searched for a while for a generic solution, then I wrote the thing myself... The idea is to split following a pattern that matches the list elements (instead of matching the delimiter).
我搜索了一段时间的通用解决方案,然后我自己写了这个东西......这个想法是按照与列表元素匹配的模式(而不是匹配分隔符)进行拆分。
My answer does not take the escape character as a parameter.
我的回答没有将转义字符作为参数。
public static List<String> commaDelimitedListStringToStringList(String list) {
// Check the validity of the list
// ex: "te\st" is not valid, backslash should be escaped
if (!list.matches("^(([^\\,]|\\,|\\\\)*(,|$))+")) {
// Could also raise an exception
return null;
}
// Matcher for the list elements
Matcher matcher = Pattern
.compile("(?<=(^|,))([^\\,]|\\,|\\\\)*(?=(,|$))")
.matcher(list);
ArrayList<String> result = new ArrayList<String>();
while (matcher.find()) {
// Unescape the list element
result.add(matcher.group().replaceAll("\\([\\,])", ""));
}
return result;
}
Description for the pattern (unescaped):
模式描述(未转义):
(?<=(^|,))
forward is start of string or a ,
(?<=(^|,))
forward 是字符串的开始或一个 ,
([^\\,]|\\,|\\\\)*
the element composed of \,
, \\
or characters wich are neither \
nor ,
([^\\,]|\\,|\\\\)*
由\,
,\\
或 字符组成的元素既不是也不\
是,
(?=(,|$))
behind is end of string or a ,
(?=(,|$))
后面是字符串的结尾或一个 ,
The pattern may be simplified.
可以简化模式。
Even with the 3 parsings (matches
+ find
+ replaceAll
), this method seems faster than the one suggested by drvdijk. It can still be optimized by writing a specific parser.
即使使用3个parsings(matches
+ find
+ replaceAll
),这种方法似乎比一个由drvdijk建议更快。它仍然可以通过编写特定的解析器来优化。
Also, what is the need of having an escape character if only one character is special, it could simply be doubled...
另外,如果只有一个字符是特殊的,那么需要转义字符是什么,它可以简单地加倍......
public static List<String> commaDelimitedListStringToStringList2(String list) {
if (!list.matches("^(([^,]|,,)*(,|$))+")) {
return null;
}
Matcher matcher = Pattern.compile("(?<=(^|,))([^,]|,,)*(?=(,|$))")
.matcher(list);
ArrayList<String> result = new ArrayList<String>();
while (matcher.find()) {
result.add(matcher.group().replaceAll(",,", ","));
}
return result;
}