在 Java 中匹配字符串中的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13981846/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Match word in String in Java
提问by Marco Pietro Cirillo
I'm trying to match Strings that contain the word "#SP"
(sans quotes, case insensitive) in Java. However, I'm finding using Regexes very difficult!
我正在尝试"#SP"
在 Java 中匹配包含单词(无引号,不区分大小写)的字符串。但是,我发现使用正则表达式非常困难!
Strings I need to match:
"This is a sample #sp string"
,
"#SP string text..."
,
"String text #Sp"
我需要匹配的字符串:
"This is a sample #sp string"
,
"#SP string text..."
,
"String text #Sp"
Strings I do not want to match:
"Anything with #Spider"
,
"#Spin #Spoon #SPORK"
我不想匹配的字符串:
"Anything with #Spider"
,
"#Spin #Spoon #SPORK"
Here's what I have so far: http://ideone.com/B7hHkR.Could someone guide me through building my regexp?
这是我到目前为止所拥有的:http: //ideone.com/B7hHkR。有人可以指导我构建正则表达式吗?
I've also tried: "\\w*\\s*#sp\\w*\\s*"
to no avail.
我也试过:"\\w*\\s*#sp\\w*\\s*"
无济于事。
Edit: Here's the code from IDEone:
编辑:这是来自 IDEone 的代码:
java.util.regex.Pattern p =
java.util.regex.Pattern.compile("\b#SP\b",
java.util.regex.Pattern.CASE_INSENSITIVE);
java.util.regex.Matcher m = p.matcher("s #SP s");
if (m.find()) {
System.out.println("Match!");
}
回答by fge
(edit: positive lookbehind not needed, only matching is done, not replacement)
(编辑:不需要正向后视,只完成匹配,不替换)
You are yet another victim of Java's misnamed regex matching methods.
您是 Java 错误命名的正则表达式匹配方法的另一个受害者。
.matches()
quite unfortunately so tries to match the whole input, which is a clear violation of the definition of "regex matching" (a regex can match anywhere in the input). The method you need to use is .find()
.
.matches()
很不幸,所以尝试匹配整个输入,这明显违反了“正则表达式匹配”的定义(正则表达式可以匹配输入中的任何地方)。您需要使用的方法是.find()
。
This is a braindead API, and unfortunately Java is not the only language having such misguided method names. Python also pleads guilty.
这是一个脑残的 API,不幸的是,Java 并不是唯一具有这种误导性方法名称的语言。蟒蛇也认罪。
Also, you have the problem that \\b
will detect on word boundaries and #
is not part of a word. You need to use an alternation detecting either the beginning of input or a space.
此外,您\\b
会遇到将在单词边界上检测而#
不是单词的一部分的问题。您需要使用交替检测输入的开头或空格。
Your code would need to look like this (non fully qualified classes):
您的代码需要如下所示(非完全限定类):
Pattern p = Pattern.compile("(^|\s)#SP\b", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("s #SP s");
if (m.find()) {
System.out.println("Match!");
}
回答by dashrb
You're doing fine, but the \b in front of the # is misleading. \b is a word boundary, but # is already not a word character (i.e. it isn't in the set [0-9A-Za-z_]). Therefore, the space before the # isn't considered a word boundary. Change to:
你做得很好,但 # 前面的 \b 会产生误导。\b 是一个词边界,但 # 已经不是一个词字符(即它不在集合 [0-9A-Za-z_] 中)。因此,# 之前的空格不被视为单词边界。改成:
java.util.regex.Pattern p =
java.util.regex.Pattern.compile("(^|\s)#SP\b",
java.util.regex.Pattern.CASE_INSENSITIVE);
The (^|\s) means: match either ^ OR \s, where ^ means the beginning of your string (e.g. "#SP String"), and \s means a whitespace character.
(^|\s) 表示:匹配 ^ 或 \s,其中 ^ 表示字符串的开头(例如“#SP String”),而 \s 表示空白字符。
回答by Will C.
The regular expression "\\w*\\s*#sp\\w*\s*"
will match 0 or more words, followed by 0 or more spaces, followed by #sp, followed by 0 or more words, followed by 0 or more spaces. My suggestion is to not use \s* to break words up in your expression, instead, use \b.
正则表达式"\\w*\\s*#sp\\w*\s*"
将匹配 0 个或多个单词,然后是 0 个或多个空格,然后是 #sp,然后是 0 个或多个单词,然后是 0 个或多个空格。我的建议是不要使用 \s* 来分解表达式中的单词,而是使用 \b。
"(^|\b)#sp(\b|$)"