为什么java中的String.replaceAll()需要在正则表达式中使用4个斜杠“\\\\”来实际替换“\”?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18875852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why String.replaceAll() in java requires 4 slashes "\\\\" in regex to actually replace "\"?
提问by Bharath
I recently noticed that, String.replaceAll(regex,replacement) behaves very weirdly when it comes to the escape-character "\"(slash)
我最近注意到,当涉及到转义字符“\”(斜线)时,String.replaceAll(regex,replacement) 的行为非常奇怪
For example consider there is a string with filepath - String text = "E:\\dummypath"
and we want to replace the "\\"
with "/"
.
例如,考虑有一个带有文件路径的字符串 -String text = "E:\\dummypath"
我们想"\\"
用"/"
.
text.replace("\\","/")
gives the output "E:/dummypath"
whereas text.replaceAll("\\","/")
raises the exception java.util.regex.PatternSyntaxException
.
text.replace("\\","/")
给出输出"E:/dummypath"
而text.replaceAll("\\","/")
引发异常java.util.regex.PatternSyntaxException
。
If we want to implement the same functionality with replaceAll()
we need to write it as,
text.replaceAll("\\\\","/")
如果我们想实现相同的功能,replaceAll()
我们需要把它写成,
text.replaceAll("\\\\","/")
One notable difference is replaceAll()
has its arguments as reg-ex whereas replace()
has arguments character-sequence!
一个显着的区别是replaceAll()
其参数为正则表达式,而replace()
参数为字符序列!
But text.replaceAll("\n","/")
works exactly the same as its char-sequence equivalent text.replace("\n","/")
但text.replaceAll("\n","/")
与它的字符序列等价物完全相同text.replace("\n","/")
Digging Deeper:Even more weird behaviors can be observed when we try some other inputs.
深入挖掘:当我们尝试其他一些输入时,可以观察到更奇怪的行为。
Lets assign text="Hello\nWorld\n"
让我们分配 text="Hello\nWorld\n"
Now,
text.replaceAll("\n","/")
, text.replaceAll("\\n","/")
, text.replaceAll("\\\n","/")
all these three gives the same output Hello/World/
现在,
text.replaceAll("\n","/")
, text.replaceAll("\\n","/")
,text.replaceAll("\\\n","/")
所有这三个都给出相同的输出Hello/World/
Java had really messed up with the reg-ex in its best possible way I feel! No other language seems to have these playful behaviors in reg-ex. Any specific reason, why Java messed up like this?
Java 真的以我觉得最好的方式搞砸了 reg-ex!在 reg-ex 中似乎没有其他语言具有这些有趣的行为。有什么具体的原因,为什么Java会这样搞砸?
采纳答案by Stephen C
@Peter Lawrey's answer describes the mechanics. The "problem" is that backslash is an escape character in both Java string literals, and in the mini-language of regexes. So when you use a string literal to represent a regex, there are two sets of escaping to consider ... depending on what you want the regex to mean.
@Peter Lawrey 的回答描述了机制。“问题”是反斜杠在 Java 字符串文字和正则表达式的迷你语言中都是转义字符。因此,当您使用字符串文字来表示正则表达式时,需要考虑两组转义……取决于您希望正则表达式的含义。
But why is it like that?
但为什么会这样呢?
It is a historical thing. Java originally didn't have regexes at all. The syntax rules for Java String literals were borrowed from C / C++, which also didn't have built-in regex support. Awkwardness of double escaping didn't become apparent in Java until they added regex support in the form of the Pattern
class ... in Java 1.4.
这是历史的事情。Java 最初根本没有正则表达式。Java 字符串文字的语法规则是从 C/C++ 借来的,它们也没有内置的正则表达式支持。Pattern
在 Java 1.4中以类的形式添加正则表达式支持之前,双重转义的尴尬并没有在 Java 中变得明显。
So how do other languages manage to avoid this?
那么其他语言如何设法避免这种情况呢?
They do it by providing direct or indirect syntactic support for regexes in the programming language itself. For instance, in Perl, Ruby, Javascript and many other languages, there is a syntax for patterns / regexs (e.g. '/pattern/') where string literal escaping rules do not apply. In C# and Python, they provide an alternative "raw" string literal syntax in which backslashes are not escapes. (But note that if you use the normal C# / Python string syntax, you have the Java problem of double escaping.)
他们通过为编程语言本身中的正则表达式提供直接或间接的语法支持来实现这一点。例如,在 Perl、Ruby、Javascript 和许多其他语言中,有一种模式/正则表达式的语法(例如“/pattern/”),其中字符串文字转义规则不适用。在 C# 和 Python 中,它们提供了另一种“原始”字符串文字语法,其中反斜杠不是转义符。(但请注意,如果您使用普通的 C#/Python 字符串语法,则会出现双重转义的 Java 问题。)
Why do
text.replaceAll("\n","/")
,text.replaceAll("\\n","/")
, andtext.replaceAll("\\\n","/")
all give the same output?
为什么
text.replaceAll("\n","/")
,text.replaceAll("\\n","/")
以及text.replaceAll("\\\n","/")
所有给予相同的输出?
The first case is a newline character at the String level. The Java regex language treats all non-special characters as matching themselves.
第一种情况是字符串级别的换行符。Java regex 语言将所有非特殊字符视为匹配自身。
The second case is a backslash followed by an "n" at the String level. The Java regex language interprets a backslash followed by an "n" as a newline.
第二种情况是反斜杠后跟字符串级别的“n”。Java 正则表达式语言将反斜杠后跟“n”解释为换行符。
The final case is a backslash followed by a newline character at the String level. The Java regex language doesn't recognize this as a specific (regex) escape sequence. However in the regex language, a backslash followed by any non-alphabetic character means the latter character. So, a backslash followed by a newline character ... means the same thing as a newline.
最后一种情况是反斜杠后跟字符串级别的换行符。Java 正则表达式语言不会将此识别为特定(正则表达式)转义序列。但是在正则表达式语言中,反斜杠后跟任何非字母字符表示后一个字符。因此,反斜杠后跟换行符 ... 与换行符的含义相同。
回答by Peter Lawrey
You need to esacpe twice, once for Java, once for the regex.
您需要 esacpe 两次,一次用于 Java,一次用于正则表达式。
Java code is
Java代码是
"\\"
makes a regex string of
制作一个正则表达式字符串
"\" - two chars
but the regex needs an escape too so it turns into
但正则表达式也需要转义,所以它变成
\ - one symbol
回答by Rajagopal
I think java really messed with regular expression in String.replaceAll();
我认为 java 真的把 String.replaceAll() 中的正则表达式搞砸了;
Other than java I have never seen a language parse regular expression this way. You will be confused if you have used regex in some other languages.
除了 java,我从未见过一种语言以这种方式解析正则表达式。如果您在其他一些语言中使用过正则表达式,您会感到困惑。
In case of using the "\\"
in replacement string, you can use java.util.regex.Matcher.quoteReplacement(String)
在使用"\\"
替换字符串的情况下,您可以使用java.util.regex.Matcher.quoteReplacement(String)
String.replaceAll("/", Matcher.quoteReplacement("\"));
By using this Matcher
class you can get the expected result.
通过使用这个Matcher
类,您可以获得预期的结果。
回答by coder91
This is because Java tries to give \
a special meaning in the replacement string, so that \$ will be a literal $ sign, but in the process they seem to have removed the actual special meaning of \
这是因为 Java 试图\
在替换字符串中赋予一个特殊的含义,这样 \$ 将是一个字面的 $ 符号,但在这个过程中他们似乎已经删除了实际的特殊含义\
While text.replaceAll("\\\\","/")
, at least can be considered to be okay in some sense (though it itself is not absolutely right), all the three executions, text.replaceAll("\n","/")
, text.replaceAll("\\n","/")
, text.replaceAll("\\\n","/")
giving same output seem even more funny. It is just contradicting as to why they have restricted the functioning of text.replaceAll("\\","/")
for the same reason.
虽然text.replaceAll("\\\\","/")
,至少在某种意义上可以被认为是可以的(虽然它本身并不是绝对正确的),但所有三个执行,text.replaceAll("\n","/")
,text.replaceAll("\\n","/")
,text.replaceAll("\\\n","/")
给出相同的输出似乎更有趣。至于为什么他们text.replaceAll("\\","/")
出于同样的原因限制了功能,这只是自相矛盾。
Java didn't mess up with regular expressions. It is because, Java likes to mess up with coders by trying to do something unique and different, when it is not at all required.
Java 没有搞乱正则表达式。这是因为,Java 喜欢在根本不需要时尝试做一些独特而不同的事情来惹恼编码人员。
回答by MTaylorEx
One way around this problem is to replace backslash with another character, use that stand-in character for intermediate replacements, then convert it back into backslash at the end. For example, to convert "\r\n" to "\n":
解决此问题的一种方法是用另一个字符替换反斜杠,使用该替代字符进行中间替换,然后在最后将其转换回反斜杠。例如,要将“\r\n”转换为“\n”:
String out = in.replace('\','@').replaceAll("@r@n","@n").replace('@','\');
Of course, that won't work very well if you choose a replacement character that can occur in the input string.
当然,如果您选择可能出现在输入字符串中的替换字符,这将不会很好地工作。
回答by sp00m
1) Let's say you want to replace a single \
using Java's replaceAll
method:
1) 假设您想\
使用 Java 的replaceAll
方法替换单个:
\
?--- 1) the final backslash
2) Java's replaceAll
method takes a regex as first argument. In a regex literal, \
has a special meaning, e.g. in \d
which is a shortcut for [0-9]
(any digit). The way to escape a metachar in a regex literalis to precede it with a \
, which leads to:
2) Java 的replaceAll
方法将正则表达式作为第一个参数。在正则表达式中, ,\
具有特殊含义,例如\d
其中是[0-9]
(任何数字)的快捷方式。在正则表达式文字中转义元字符的方法是在它前面加上 a \
,这导致:
\ \
| ?--- 1) the final backslash
|
?----- 2) the backslash needed to escape 1) in a regex literal
3) In Java, there is no regex literal: you write a regex in a string literal(unlike JavaScript for example, where you can write /\d+/
). But in a string literal, \
also has a special meaning, e.g. in \n
(a new line) or \t
(a tab). The way to escape a metachar in a string literalis to precede it with a \
, which leads to:
3) 在 Java 中,没有正则表达式文字:您可以在字符串文字中编写正则表达式(例如,与 JavaScript 不同,您可以在其中编写/\d+/
)。但在字符串字面量中,\
也有特殊含义,例如 in \n
(a new line) or \t
(a tab)。在字符串文字中转义元字符的方法是在它前面加上 a \
,这导致:
\\
|||?--- 1) the final backslash
||?---- 3) the backslash needed to escape 1) in a string literal
|?----- 2) the backslash needed to escape 1) in a regex literal
?------ 3) the backslash needed to escape 2) in a string literal