如何在Java中为正则表达式转义文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/60160/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 07:46:51  来源:igfitidea点击:

How to escape text for regular expression in Java

javaregexescaping

提问by Matt

Does Java have a built-in way to escape arbitrary text so that it can be included in a regular expression? For example, if my users enter "$5", I'd like to match that exactly rather than a "5" after the end of input.

Java 是否具有转义任意文本的内置方法,以便将其包含在正则表达式中?例如,如果我的用户输入“$5”,我希望在输入结束后准确匹配而不是“5”。

采纳答案by Mike Stone

Since Java 1.5, yes:

Java 1.5 开始,是的

Pattern.quote("");

回答by Rob Oxspring

I think what you're after is \Q$5\E. Also see Pattern.quote(s)introduced in Java5.

我想你所追求的是\Q$5\E。另见Pattern.quote(s)Java5中介绍。

See Patternjavadoc for details.

有关详细信息,请参阅模式javadoc。

回答by Pavel Feldman

Difference between Pattern.quoteand Matcher.quoteReplacementwas not clear to me before I saw following example

在我看到以下示例之前,我不清楚Pattern.quote和之间的区别Matcher.quoteReplacement

s.replaceFirst(Pattern.quote("text to replace"), 
               Matcher.quoteReplacement("replacement text"));

回答by Meower68

First off, if

首先,如果

  • you use replaceAll()
  • you DON'T use Matcher.quoteReplacement()
  • the text to be substituted in includes a $1
  • 你使用 replaceAll()
  • 你不要使用 Matcher.quoteReplacement()
  • 要替换的文本包括 $1

it won't put a 1 at the end. It will look at the search regex for the first matching group and sub THAT in. That's what $1, $2 or $3 means in the replacement text: matching groups from the search pattern.

它不会在最后放一个 1。它将查看第一个匹配组的搜索正则表达式,并在其中查找子组。这就是 $1、$2 或 $3 在替换文本中的含义:匹配搜索模式中的组。

I frequently plug long strings of text into .properties files, then generate email subjects and bodies from those. Indeed, this appears to be the default way to do i18n in Spring Framework. I put XML tags, as placeholders, into the strings and I use replaceAll() to replace the XML tags with the values at runtime.

我经常将长文本字符串插入 .properties 文件,然后从中生成电子邮件主题和正文。事实上,这似乎是 Spring Framework 中执行 i18n 的默认方式。我将 XML 标记作为占位符放入字符串中,并使用 replaceAll() 在运行时用值替换 XML 标记。

I ran into an issue where a user input a dollars-and-cents figure, with a dollar sign. replaceAll() choked on it, with the following showing up in a stracktrace:

我遇到了一个问题,用户输入带有美元符号的美元和美分数字。replaceAll() 被它呛住了,在跟踪记录中显示以下内容:

java.lang.IndexOutOfBoundsException: No group 3
at java.util.regex.Matcher.start(Matcher.java:374)
at java.util.regex.Matcher.appendReplacement(Matcher.java:748)
at java.util.regex.Matcher.replaceAll(Matcher.java:823)
at java.lang.String.replaceAll(String.java:2201)

In this case, the user had entered "$3" somewhere in their input and replaceAll() went looking in the search regex for the third matching group, didn't find one, and puked.

在这种情况下,用户在其输入中的某处输入了“$3”,replaceAll() 在搜索正则表达式中查找第三个匹配组,但没有找到,然后呕吐了。

Given:

鉴于:

// "msg" is a string from a .properties file, containing "<userInput />" among other tags
// "userInput" is a String containing the user's input

replacing

替换

msg = msg.replaceAll("<userInput \/>", userInput);

with

msg = msg.replaceAll("<userInput \/>", Matcher.quoteReplacement(userInput));

solved the problem. The user could put in any kind of characters, including dollar signs, without issue. It behaved exactly the way you would expect.

解决了这个问题。用户可以毫无问题地输入任何类型的字符,包括美元符号。它的行为完全符合您的预期。

回答by Moscow Boy

To have protected pattern you may replace all symbols with "\\\\", except digits and letters. And after that you can put in that protected pattern your special symbols to make this pattern working not like stupid quoted text, but really like a patten, but your own. Without user special symbols.

要获得受保护的模式,您可以用“\\\\”替换所有符号,但数字和字母除外。之后,您可以在该受保护的模式中放入您的特殊符号,使该模式不像愚蠢的引用文本那样工作,而是真正像一个模式,而是您自己的模式。没有用户特殊符号。

public class Test {
    public static void main(String[] args) {
        String str = "y z (111)";
        String p1 = "x x (111)";
        String p2 = ".* .* \(111\)";

        p1 = escapeRE(p1);

        p1 = p1.replace("x", ".*");

        System.out.println( p1 + "-->" + str.matches(p1) ); 
            //.*\ .*\ \(111\)-->true
        System.out.println( p2 + "-->" + str.matches(p2) ); 
            //.* .* \(111\)-->true
    }

    public static String escapeRE(String str) {
        //Pattern escaper = Pattern.compile("([^a-zA-z0-9])");
        //return escaper.matcher(str).replaceAll("\\");
        return str.replaceAll("([^a-zA-Z0-9])", "\\");
    }
}

回答by Androidme

It may be too late to respond, but you can also use Pattern.LITERAL, which would ignore all special characters while formatting:

响应可能为时已晚,但您也可以使用Pattern.LITERAL,它会在格式化时忽略所有特殊字符:

Pattern.compile(textToFormat, Pattern.LITERAL);

回答by Adam111p

Pattern.quote("blabla") works nicely.

Pattern.quote("blabla") 效果很好。

The Pattern.quote() works nicely. It encloses the sentence with the characters "\Q" and "\E", and if it does escape "\Q" and "\E". However, if you need to do a real regular expression escaping(or custom escaping), you can use this code:

Pattern.quote() 工作得很好。它用字符“ \Q”和“ \E”将句子括起来,如果它确实转义了“\Q”和“\E”。但是,如果您需要进行真正的正则表达式转义(或自定义转义),则可以使用以下代码:

String someText = "Some/s/wText*/,**";
System.out.println(someText.replaceAll("[-\[\]{}()*+?.,\\\\^$|#\\s]", "\\
String someText = "Some\E/s/wText*/,**";
System.out.println("Pattern.quote: "+ Pattern.quote(someText));
System.out.println("Full escape: "+someText.replaceAll("[-\[\]{}()*+?.,\\\\^$|#\\s]", "\\##代码##"));
"));

This method returns: Some/\s/wText*/\,**

此方法返回:Some/\s/wText*/\,**

Code for example and tests:

代码示例和测试:

##代码##

回答by Akhil Kathi

^(Negation) symbol is used to match something that is not in the character group.

^(Negation) 符号用于匹配不在字符组中的内容。

This is the link to Regular Expressions

这是正则表达式的链接

Here is the image info about negation:

这是关于否定的图像信息:

Info about negation

关于否定的信息