Java,在正则表达式中转义(使用)引号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6398365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-16 05:35:26  来源:igfitidea点击:

Java, escaping (using) quotes in a regex

javaregexescaping

提问by Spectraljump

I'm trying to use the following regex in Java, that's supposed to match any lang="2-char-lang-name":

我正在尝试在 Java 中使用以下正则表达式,它应该与任何匹配lang="2-char-lang-name"

String lang = "lang=\"" + L.detectLang(inputText) +"\"";
shovel.replaceFirst("lang=\"[..]\"", lang);

I know that a single slash would be interpreted by regex as a slash and not an escape character (so my code doesn't work), but if I escape the slash, the "won't be escaped any more and I'd get a syntax error.

我知道正则表达式会将单个斜杠解释为斜杠而不是转义字符(因此我的代码不起作用),但是如果我对斜杠"进行转义,就不会再转义了,我会得到一个语法错误。

In other words, how can I include a "in the regex? "lang=\\"[..]\\""won't work. I've also tried three slashes and that didn't have any matches either.

换句话说,我如何"在正则表达式中包含 a ?"lang=\\"[..]\\""不会工作。我也试过三个斜线,但也没有任何匹配项。

I am also aware of the general rule that you don't use regex to parse XML/HTML. (and shovelis an XML) However, all I'm doing is, looking for a langattribute that is within the first 30 characters of the XML, and I want to replace it. Is it really a bad idea to use regex in this case? I don't think using DOM would be any better/more efficient.

我也知道不使用正则表达式来解析 XML/HTML 的一般规则。(并且shovel是一个 XML)但是,我所做的只是寻找lang位于 XML 前 30 个字符内的属性,并且我想替换它。在这种情况下使用正则表达式真的是个坏主意吗?我不认为使用 DOM 会更好/更有效。

采纳答案by Dan Tao

Three slashes would be correct (\\+ \"becomes \+ "= \"). (Update: Actually, it turns out that isn't even necessary. A single slash also works, it seems.) The problem is your use of [..]; the []symbols mean "any of the characters in here" (so [..]just means "any character").

三个斜线是正确的(\\+\"变成\+ "= \")。(更新:实际上,事实证明这甚至没有必要。似乎单斜线也可以。)问题在于您使用了[..]; 该[]符号是指“任何在这里的人物”(所以[..]只是意味着“任何字符”)。

Drop the []and you should be getting what you want:

放弃[],你应该得到你想要的:

String ab = "foo=\"bar\" lang=\"AB\"";
String regex = "lang=\\"..\\"";
String cd = ab.replaceFirst(regex, "lang=\"CD\"");
System.out.println(cd);

Output:

输出:

foo="bar" lang="CD"

回答by OpenSauce

Have you tried it with a single backslash? The output of

你试过用一个反斜杠吗?的输出

public static void main(String[] args) {
  String inputString = "<xml lang=\"the Queen's English\">";
  System.out.println(inputString.replaceFirst("lang=\"[^\"]*\"", "lang=\"American\"" ));
}

is

<xml lang="American">

which, if I'm reading you correctly, is what you want.

如果我没看错的话,这就是你想要的。

EDIT to add: the reason a single backslash works is that it's not actually part of the string, it's just part of the syntax for expressing the string. The length of the string "\""is 1, not 2, and the method replaceFirstjust sees a string containing a "(with no backslash). This is why e.g. \s(the whitespace character class in a regex) has to be written \\sin a Java string literal.

编辑添加:单个反斜杠起作用的原因是它实际上不是字符串的一部分,它只是表达字符串的语法的一部分。字符串的长度"\""是 1,而不是 2,并且该方法replaceFirst只看到一个包含 a "(没有反斜杠)的字符串。这就是为什么\s必须用\\sJava 字符串文字编写eg (正则表达式中的空白字符类)的原因。

On the wisdom of using regex: this should be fine, if you're sure about the format of the files you're processing. If the files might include a commented-out header complete with langspec above the real header, you could be in trouble!

关于使用正则表达式的智慧:如果您确定要处理的文件的格式,这应该没问题。如果文件可能包含注释掉的标头,并lang在实际标头上方完成规范,那么您可能会遇到麻烦!