从 Java 中解析正则表达式中转义字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/168639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Escaping a String from getting regex parsed in Java
提问by doodaddy
In Java, suppose I have a String variable S, and I want to search for it inside of another String T, like so:
在 Java 中,假设我有一个字符串变量 S,我想在另一个字符串 T 中搜索它,如下所示:
if (T.matches(S)) ...
(note: the above line was T.contains() until a few posts pointed out that that method does not use regexes. My bad.)
(注意:上面一行是 T.contains() 直到一些帖子指出该方法不使用正则表达式。我的错。)
But now suppose S may have unsavory characters in it. For instance, let S = "[hi". The left square bracket is going to cause the regex to fail. Is there a function I can call to escape S so that this doesn't happen? In this particular case, I would like it to be transformed to "\[hi".
但现在假设 S 中可能有令人讨厌的字符。例如,让 S = "[hi"。左方括号将导致正则表达式失败。有没有我可以调用的函数来转义 S 以免发生这种情况?在这种特殊情况下,我希望将其转换为“\[hi”。
回答by Tom Hawtin - tackline
String.contains does not use regex, so there isn't a problem in this case.
String.contains 不使用正则表达式,因此在这种情况下没有问题。
Where a regex is required, rather rejecting strings with regex special characters, use java.util.regex.Pattern.quote to escape them.
如果需要正则表达式,而不是拒绝带有正则表达式特殊字符的字符串,请使用 java.util.regex.Pattern.quote 来转义它们。
回答by Michael Myers
As Tom Hawtinsaid, you need to quote the pattern. You can do this in two ways (edit: actually three ways, as pointed out by @diastrophism):
正如汤姆霍廷所说,你需要引用模式。您可以通过两种方式执行此操作(编辑:实际上是三种方式,正如@ diastrophism所指出的那样):
Surround the string with "\Q" and "\E", like:
if (T.matches("\Q" + S + "\E"))Use Patterninstead. The code would be something like this:
Pattern sPattern = Pattern.compile(S, Pattern.LITERAL); if (sPattern.matcher(T).matches()) { /* do something */ }This way, you can cache the compiled Pattern and reuse it. If you are using the same regex more than once, you almost certainly want to do it this way.
用 "\Q" 和 "\E" 包围字符串,例如:
if (T.matches("\Q" + S + "\E"))改用模式。代码将是这样的:
Pattern sPattern = Pattern.compile(S, Pattern.LITERAL); if (sPattern.matcher(T).matches()) { /* do something */ }这样,您可以缓存已编译的 Pattern 并重用它。如果您不止一次使用同一个正则表达式,您几乎肯定想要这样做。
Note that if you are using regular expressions to test whether a string is inside a larger string, you should put .* at the start and end of the expression. But this will not work if you are quoting the pattern, since it will then be looking for actual dots. So, are you absolutely certain you want to be using regular expressions?
请注意,如果您使用正则表达式来测试字符串是否在更大的字符串中,则应将 .* 放在表达式的开头和结尾。但是,如果您引用模式,这将不起作用,因为它将寻找实际的点。那么,您确定要使用正则表达式吗?
回答by Diastrophism
Try Pattern.quote(String). It will fix up anything that has special meaning in the string.
试试Pattern.quote(String)。它将修复字符串中具有特殊含义的任何内容。
回答by Jay
Any particular reason not to use String.indexOf() instead? That way it will always be interpreted as a regular string rather than a regex.
有什么特别的理由不使用 String.indexOf() 来代替吗?这样,它将始终被解释为常规字符串而不是正则表达式。
回答by Aaron
Regex uses the backslash character '\' to escape a literal. Given that java also uses the backslash character you would need to use a double bashslash like:
正则表达式使用反斜杠字符“\”来转义文字。鉴于 java 还使用反斜杠字符,您需要使用双 bashslash,例如:
String S = "\[hi"
That will become the String:
这将成为字符串:
\[hi
which will be passed to the regex.
这将传递给正则表达式。
Or if you only care about a literal String and don't need a regex you could do the following:
或者,如果您只关心文字字符串并且不需要正则表达式,您可以执行以下操作:
if (T.indexOf("[hi") != -1) {
回答by anjanb
T.contains() (according to javadoc : http://java.sun.com/javase/6/docs/api/java/lang/String.html) does not use regexes. contains() delegates to indexOf() only.
T.contains()(根据 javadoc:http: //java.sun.com/javase/6/docs/api/java/lang/String.html)不使用正则表达式。contains() 仅委托给 indexOf()。
So, there are NO regexes used here. Were you thinking of some other String method ?
所以,这里没有使用正则表达式。您是否在考虑其他一些 String 方法?

