java 正则表达式中的双正斜杠句号
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10771703/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Double forward slash period in regular expression
提问by jahmezz
I've recently started using regular expressions in Java and I ran into a strange expression.
我最近开始在 Java 中使用正则表达式,但遇到了一个奇怪的表达式。
The problem asks to find "words" consisting of only letters and at most one concluding period. So for example, if I input the string:
该问题要求找到仅由字母和至多一个结束句点组成的“单词”。例如,如果我输入字符串:
one two. wr7ng not1 three. nope..
The engine will find one, two and three as the matching words. The given solution for the problem is this Pattern:
引擎将找到一、二和三作为匹配词。该问题的给定解决方案是这种模式:
for (String tok : s.split(" ")) {
if (tok.matches("[a-zA-Z]+//.?")) {
// code done to record successful match
}
}
What do the two forward slashes mean? I compared this expression with this one:
两个正斜杠是什么意思?我将这个表达式与这个表达式进行了比较:
[a-zA-Z]+.?
And found only the latter incorrectly accepted digits in the final slot (the period). Is this the only difference?
并在最后一个插槽(句点)中仅发现后者错误接受的数字。这是唯一的区别吗?
回答by Ira Baxter
Are you sure it wasn't backslashes?
你确定不是反斜杠?
"[a-zA-Z]+\.?"
Two backslashes in a literal string is interpreted to mean, "insert a single backslash in the literal string". (As a convention, in many languages, backslash anychar means, "insert anychar").
文字字符串中的两个反斜杠被解释为“在文字字符串中插入一个反斜杠”。(作为惯例,在许多语言中,反斜杠 anychar 的意思是“插入 anychar”)。
When the literal string is interpreted as a regular expression, the actual text
当文字字符串被解释为正则表达式时,实际文本
\.
means, "match the 'period' as a literal character".
意思是“将‘句号’作为文字字符匹配”。
If you don't have the backslash "escape character", the in most Regexp engines means, "match any character".
如果您没有反斜杠“转义字符”,则大多数 Regexp 引擎中的意思是“匹配任何字符”。
回答by Mark Reed
Looks like you have a typo there. It should be "[a-zA-Z]+\\."
.
看起来你有一个错字。应该是"[a-zA-Z]+\\."
。
That string value becomes the regular expression value [a-zA-Z]+\.
. The backslash indicates that the .
should be treated as a literal period. Without it, .
is a special regular-expression metacharacter that matches anysingle character (including digits).
该字符串值成为正则表达式值[a-zA-Z]+\.
。反斜杠表示.
应将其视为文字句点。如果没有它,.
则是一个特殊的正则表达式元字符,可以匹配任何单个字符(包括数字)。
回答by jaselg
The exact RE is:
确切的 RE 是:
[a-zA-Z]+\.?
and to compile it using Java, you need one more back slash \ which means an escape character in a Java string:
并且要使用 Java 编译它,您还需要一个反斜杠 \,它表示 Java 字符串中的转义字符:
"[a-zA-Z]+\.?"
回答by Surender Thakran
Using a .
(dot) will be interpreted as a regex metacharacter which means "any character".
使用.
(点)将被解释为正则表达式元字符,意思是“任何字符”。
Using \.
will give a compiler error viz. Illegal Escape Character
使用\.
将给出编译器错误即。非法转义字符
Using \\.
will be interpreted as a simple .
(dot) character, which is what you need to use.
Using\\.
将被解释为一个简单的.
(点)字符,这是您需要使用的。
So for a word that contains only letters you use [a-zA-Z]+
where the +
(plus) is a quantifier which means "one or more".
因此,对于仅包含字母的单词,您使用[a-zA-Z]+
的+
(加号)是量词,表示“一个或多个”。
For a single .
(dot) character you use \\.
.
Now for "atmost once" part of your .
(dot) character you will use the ?
quantifier which means "one or more". Your expression for the .
part becomes \\.?
.
对于单个.
(点)字符,您使用\\.
. 现在,对于.
(点)字符的“最多一次”部分,您将使用?
表示“一个或多个”的量词。您对.
零件的表达式变为\\.?
。
Hence your regex expression will be [a-zA-Z]+\\.?
.
因此,您的正则表达式将是[a-zA-Z]+\\.?
.
回答by Stephen C
A forward slash has no special meaning in a regex, so "//" means match two forward slashes.
正斜杠在正则表达式中没有特殊含义,因此“//”表示匹配两个正斜杠。
If that doesn't make sense, this is either a typo, or you've misread or mis-transcribed the regex. The obvious "correction" of replacing forward slashes with back slashes gives this:
如果这没有意义,这要么是打字错误,要么是您误读或错误转录了正则表达式。用反斜杠替换正斜杠的明显“更正”给出了这个:
tok.matches("[a-zA-Z]+\.?")
which means "match roman letters followed by an optional '.'
". In context, that couldmean an English word followed by a fullstop / period.
这意味着“匹配罗马字母后跟一个可选的'.'
”。在上下文中,这可能意味着一个英文单词后跟一个句号/句点。
For the record, "[a-zA-Z]+.?"
matches 1 or more roman letters followed (optionally) by one more character. The "eagerness" of the +
operator means that the optional character will be a non-letter ... if anything.
作为记录,"[a-zA-Z]+.?"
匹配 1 个或多个罗马字母,后跟(可选)一个字符。+
运算符的“热切”意味着可选字符将是非字母......如果有的话。