java 正则表达式中的双正斜杠句号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10771703/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 02:28:59  来源:igfitidea点击:

Double forward slash period in regular expression

javaregex

提问by jahmezz

I've recently started using regular expressions in Java and I ran into a strange expression.

我最近开始在 Java 中使用正则表达式,但遇到了一个奇怪的表达式。

The problem asks to find "words" consisting of only letters and at most one concluding period. So for example, if I input the string:

该问题要求找到仅由字母和至多一个结束句点组成的“单词”。例如,如果我输入字符串:

one two. wr7ng not1 three. nope..

The engine will find one, two and three as the matching words. The given solution for the problem is this Pattern:

引擎将找到一、二和三作为匹配词。该问题的给定解决方案是这种模式:

for (String tok : s.split(" ")) {
  if (tok.matches("[a-zA-Z]+//.?")) {
    // code done to record successful match
  }
}

What do the two forward slashes mean? I compared this expression with this one:

两个正斜杠是什么意思?我将这个表达式与这个表达式进行了比较:

[a-zA-Z]+.?

And found only the latter incorrectly accepted digits in the final slot (the period). Is this the only difference?

并在最后一个插槽(句点)中仅发现后者错误接受的数字。这是唯一的区别吗?

回答by Ira Baxter

Are you sure it wasn't backslashes?

你确定不是反斜杠

  "[a-zA-Z]+\.?"

Two backslashes in a literal string is interpreted to mean, "insert a single backslash in the literal string". (As a convention, in many languages, backslash anychar means, "insert anychar").

文字字符串中的两个反斜杠被解释为“在文字字符串中插入一个反斜杠”。(作为惯例,在许多语言中,反斜杠 anychar 的意思是“插入 anychar”)。

When the literal string is interpreted as a regular expression, the actual text

当文字字符串被解释为正则表达式时,实际文本

         \.

means, "match the 'period' as a literal character".

意思是“将‘句号’作为文字字符匹配”。

If you don't have the backslash "escape character", the in most Regexp engines means, "match any character".

如果您没有反斜杠“转义字符”,则大多数 Regexp 引擎中的意思是“匹配任何字符”。

回答by Mark Reed

Looks like you have a typo there. It should be "[a-zA-Z]+\\.".

看起来你有一个错字。应该是"[a-zA-Z]+\\."

That string value becomes the regular expression value [a-zA-Z]+\.. The backslash indicates that the .should be treated as a literal period. Without it, .is a special regular-expression metacharacter that matches anysingle character (including digits).

该字符串值成为正则表达式值[a-zA-Z]+\.。反斜杠表示.应将其视为文字句点。如果没有它,.则是一个特殊的正则表达式元字符,可以匹配任何单个字符(包括数字)。

回答by jaselg

The exact RE is:

确切的 RE 是:

[a-zA-Z]+\.?

and to compile it using Java, you need one more back slash \ which means an escape character in a Java string:

并且要使用 Java 编译它,您还需要一个反斜杠 \,它表示 Java 字符串中的转义字符:

"[a-zA-Z]+\.?"

回答by Surender Thakran

Using a .(dot) will be interpreted as a regex metacharacter which means "any character".

使用.(点)将被解释为正则表达式元字符,意思是“任何字符”。

Using \.will give a compiler error viz. Illegal Escape Character

使用\.将给出编译器错误即。非法转义字符

Using \\.will be interpreted as a simple .(dot) character, which is what you need to use.

Using\\.将被解释为一个简单的.(点)字符,这是您需要使用的。

So for a word that contains only letters you use [a-zA-Z]+where the +(plus) is a quantifier which means "one or more".

因此,对于仅包含字母的单词,您使用[a-zA-Z]++(加号)是量词,表示“一个或多个”。

For a single .(dot) character you use \\.. Now for "atmost once" part of your .(dot) character you will use the ?quantifier which means "one or more". Your expression for the .part becomes \\.?.

对于单个.(点)字符,您使用\\.. 现在,对于.(点)字符的“最多一次”部分,您将使用?表示“一个或多个”的量词。您对.零件的表达式变为\\.?

Hence your regex expression will be [a-zA-Z]+\\.?.

因此,您的正则表达式将是[a-zA-Z]+\\.?.

回答by Stephen C

A forward slash has no special meaning in a regex, so "//" means match two forward slashes.

正斜杠在正则表达式中没有特殊含义,因此“//”表示匹配两个正斜杠。

If that doesn't make sense, this is either a typo, or you've misread or mis-transcribed the regex. The obvious "correction" of replacing forward slashes with back slashes gives this:

如果这没有意义,这要么是打字错误,要么是您误读或错误转录了正则表达式。用反斜杠替换正斜杠的明显“更正”给出了这个:

    tok.matches("[a-zA-Z]+\.?")

which means "match roman letters followed by an optional '.'". In context, that couldmean an English word followed by a fullstop / period.

这意味着“匹配罗马字母后跟一个可选的'.'”。在上下文中,这可能意味着一个英文单词后跟一个句号/句点。



For the record, "[a-zA-Z]+.?"matches 1 or more roman letters followed (optionally) by one more character. The "eagerness" of the +operator means that the optional character will be a non-letter ... if anything.

作为记录,"[a-zA-Z]+.?"匹配 1 个或多个罗马字母,后跟(可选)一个字符。+运算符的“热切”意味着可选字符将是非字母......如果有的话。