java 匹配转义字符(引号)的正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6525556/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regular expression to match escaped characters (quotes)
提问by PNS
I want to build a simple regex that covers quoted strings, including any escaped quotes within them. For instance,
我想构建一个简单的正则表达式,涵盖带引号的字符串,包括其中的任何转义引号。例如,
"This is valid"
"This is \" also \" valid"
Obviously, something like
显然,像
"([^"]*)"
does not work, because it matches up to the first escaped quote.
不起作用,因为它与第一个转义引号匹配。
What is the correct version?
什么是正确的版本?
I suppose the answer would be the same for other escaped characters (by just replacing the respective character).
我想其他转义字符的答案是相同的(只需替换相应的字符)。
By the way, I am aware of the "catch-all" regex
顺便说一句,我知道“包罗万象”的正则表达式
"(.*?)"
but I try to avoid it whenever possible, because, not surprisingly, it runs somewhat slower than a more specific one.
但我尽量避免使用它,因为毫不奇怪,它的运行速度比更具体的要慢一些。
采纳答案by maksymiuk
The problem with all the other answers is they only match for the initial obvious testing, but fall short to further scrutiny. For example, all of the answers expect that the very first quote will not be escaped. But most importantly, escaping is a more complex process than just a single backslash, because that backslash itself can be escaped. Imagine trying to actually match a string which ends with a backslash. How would that be possible?
所有其他答案的问题在于它们仅与最初的明显测试相匹配,但无法进行进一步的。例如,所有答案都希望第一个引用不会被转义。但最重要的是,转义是一个比单个反斜杠更复杂的过程,因为反斜杠本身可以被转义。想象一下尝试实际匹配以反斜杠结尾的字符串。那怎么可能呢?
This would be the pattern you are looking for. It doesn't assume that the first quote is the working one, and it will allow for backslashes to be escaped.
这将是您正在寻找的模式。它不假设第一个引号是有效的,并且它允许转义反斜杠。
(?<!\)(?:\{2})*"(?:(?<!\)(?:\{2})*\"|[^"])+(?<!\)(?:\{2})*"
回答by arcain
Here is one that I've used in the past:
这是我过去使用过的一个:
("[^"\]*(?:\.[^"\]*)*")
This will capture quoted strings, along with any escaped quote characters, and exclude anything that doesn't appear in enclosing quotes.
这将捕获带引号的字符串以及任何转义的引号字符,并排除未出现在封闭引号中的任何内容。
For example, the pattern will capture "This is valid"
and "This is \" also \" valid"
from this string:
例如,模式将从这个字符串中捕获"This is valid"
和"This is \" also \" valid"
:
"This is valid" this won't be captured "This is \" also \" valid"
This pattern will notmatch the string "I don't \"have\" a closing quote
, and will allow for additional escape codes in the string (e.g., it will match "hello world!\n"
).
此模式将不匹配 string "I don't \"have\" a closing quote
,并将允许在 string 中使用其他转义码(例如,它将匹配"hello world!\n"
)。
Of course, you'll have to escape the pattern to use it in your code, like so:
当然,您必须转义该模式才能在代码中使用它,如下所示:
"(\"[^\"\\]*(?:\\.[^\"\\]*)*\")"
回答by agent-j
Try this one... It prefers the \"
, if that matches, it will pick it, otherwise it will pick "
.
试试这个...它更喜欢\"
,如果匹配,它会选择它,否则它会选择"
。
"((?:\"|[^"])*)"
Once you have matched the string, you'll need to take the first captured group's value and replace \"
with "
.
匹配字符串后,您需要获取第一个捕获组的值并替换\"
为"
.
Edit: Fixed grouping logic.
编辑:固定分组逻辑。
回答by Dinesh Lomte
Please find in the below code comprising expression evaluation for String, Numberand Decimal.
请在下面的代码中找到包含String、Number和Decimal表达式评估的代码。
public static void commaSeparatedStrings() {
String value = "'It\'s my world', 'Hello World', 'What\'s up', 'It\'s just what I expected.'";
if (value.matches("'([^\'\\]*(?:\\.[^\'\\])*)[\w\s,\.]+'(((,)|(,\s))'([^\'\\]*(?:\\.[^\'\\])*)[\w\s,\.]+')*")) {
System.out.println("Valid...");
} else {
System.out.println("Invalid...");
}
}
/**
*
*/
public static void commaSeparatedDecimals() {
String value = "-111.00, 22111.00, -1.00";
// "\d+([,]|[,\s]\d+)*"
if (value.matches(
"^([-]?)\d+\.\d{1,10}?(((,)|(,\s))([-]?)\d+\.\d{1,10}?)*")) {
System.out.println("Valid...");
} else {
System.out.println("Invalid...");
}
}
/**
*
*/
public static void commaSeparatedNumbers() {
String value = "-11, 22, -31";
if (value.matches("^([-]?)\d+(((,)|(,\s))([-]?)\d+)*")) {
System.out.println("Valid...");
} else {
System.out.println("Invalid...");
}
}
回答by awwsmm
This
这
("((?:[^"\])*(?:\\")*(?:\\)*)*")
will capture all strings (within double quotes), including \" and \\ escape sequences. (Note that this answer assumes that the onlyescape sequences in your string are \" or \\ sequences -- no other backslash characters or escape sequences will be captured.)
将捕获所有字符串(在双引号内),包括 \" 和 \\ 转义序列。(请注意,此答案假定您的字符串中唯一的转义序列是 \" 或 \\ 序列——没有其他反斜杠字符或转义序列将被俘虏。)
("(?: # begin with a quote and capture...
(?:[^"\])* # any non-\, non-" characters
(?:\\")* # any combined \" sequences
(?:\\)* # and any combined \ sequences
)* # any number of times
") # then, close the string with a quote
Also, note that maksymiuk's accepted answercontains an "edge case" ("Imagine trying to actually match a string which ends with a backslash") which is actually just a malformed string. Something like
另外,请注意,maksymiuk 接受的答案包含一个“边缘情况”(“想象一下尝试实际匹配以反斜杠结尾的字符串”),它实际上只是一个格式错误的字符串。就像是
"this\"
...is not a "string ending on a backslash", but an unclosed string ending on an escaped quotation mark. A string which truly ends on a backslash would look like
...不是“以反斜杠结尾的字符串”,而是以转义引号结尾的未闭合字符串。真正以反斜杠结尾的字符串看起来像
"this\"
...and the above solution handles this case.
...上面的解决方案处理这种情况。
If you want to expand a bit, this...
如果你想扩展一点,这...
(\(?:b|t|n|f|r|\"|\)|\(?:(?:[0-2][0-9]{1,2}|3[0-6][0-9]|37[0-7]|[0-9]{1,2}))|\(?:u(?:[0-9a-fA-F]{4})))
...captures all common escape sequences (including escaped quotes):
...捕获所有常见的转义序列(包括转义引号):
(\ # get the preceding slash (for each section)
(?:b|t|n|f|r|\"|\) # capture common sequences like \n and \t
|\ # OR (get the preceding slash and)...
# capture variable-width octal escape sequences like , , or 7
(?:(?:[0-2][0-9]{1,2}|3[0-6][0-9]|37[0-7]|[0-9]{1,2}))
|\ # OR (get the preceding slash and)...
(?:u(?:[0-9a-fA-F]{4})) # capture fixed-width Unicode sequences like \u0242 or \uFFAD
)
See this Gistfor more information on the second point.
有关第二点的更多信息,请参阅此要点。
回答by Igor
It work for me and it is simpler than current answer
它对我有用,而且比当前的答案更简单
(?<!\+)"(\"|[^"])*(?<!\+)"
(?<!\\+)
- before "
not must be \
, and this expression is left and right.
(?<!\\+)
- before "
not must be \
,这个表达式是 left 和 right 。
(\\"|[^"])*
- that inside quotes: might be escaped quotes \\"
or anything for except quotes [^"]
(\\"|[^"])*
- 里面的引号:可能是转义引号\\"
或除引号之外的任何内容[^"]
Current regexp work correctly for follow strings:
当前的正则表达式可以正确处理以下字符串:
234
- false
or null
234
-false
或null
"234"
- true
or ["234"]
"234"
-true
或["234"]
""
- true
or [""]
""
-true
或[""]
"234 + 321 \\"24\\""
- true
or ["234 + 321 \\"24\\""]
"234 + 321 \\"24\\""
-true
或["234 + 321 \\"24\\""]
"234 + 321 \\"24\\"" + 123 + "\\"test(\\"235\\")\\""
- true
"234 + 321 \\"24\\"" + 123 + "\\"test(\\"235\\")\\""
—— true
or ["234 + 321 \\"24\\"", "\\"test(\\"235\\")\\""]
或者 ["234 + 321 \\"24\\"", "\\"test(\\"235\\")\\""]
"234 + 321 \\"24\\"" + 123 + "\\"test(\\"235\\")\\"\\"
- true
"234 + 321 \\"24\\"" + 123 + "\\"test(\\"235\\")\\"\\"
—— true
or ["234 + 321 \\"24\\""]
或者 ["234 + 321 \\"24\\""]