java Java正则表达式匹配单词+空格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17586367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Regex to Match words + spaces
提问by Chaos
I am trying to construct this simple regex to match words + whitespace in Java, but I got confused trying to work it out. There are a lot of similar examples on this site, but the answers mostly give out the regex itself without explaining how it is constructed.
我正在尝试构建这个简单的正则表达式来匹配 Java 中的单词 + 空格,但是我在尝试解决这个问题时感到困惑。这个站点上有很多类似的例子,但答案大多给出了正则表达式本身,而没有解释它是如何构建的。
What I'm looking for is the Line of Thought behind forming the regular expression.
我正在寻找的是形成正则表达式背后的思路。
Sample Input String:
示例输入字符串:
String Tweet = "\"Whole Lotta Love\" - Led Zeppelin";
String Tweet = "\"Whole Lotta Love\" - Led Zeppelin";
which when printed is: "Whole Lotta Love" - Led Zeppelin
打印时为: "Whole Lotta Love" - Led Zeppelin
Problem Statement:
问题陈述:
I want to find out if a String has a quotation in it. In the above sample string, Whole Lotta Love
is the quotation.
我想知道一个字符串中是否有引号。在上面的示例字符串中,Whole Lotta Love
是引号。
What I've tried:
我试过的:
My first approach was to match anything between two double quotes, so I came up with the following regex:
我的第一种方法是匹配两个双引号之间的任何内容,所以我想出了以下正则表达式:
"\"(\\w+\")"
and "\"(^\")"
"\"(\\w+\")"
和 "\"(^\")"
But this approach only works if there are no spaces between the two double quotes, like:
但这种方法仅适用于两个双引号之间没有空格的情况,例如:
"Whole" Lotta Love
So I tried to modify my regex to match spaces, and this is where I got lost.
所以我试图修改我的正则表达式以匹配空格,这就是我迷路的地方。
I tried the following, but they don't match
我尝试了以下,但它们不匹配
"\"(\\w+?\\s+\")"
, "\"(\\w+)(\\s+)\""
, "\"(\\w+)?(\\s+)\""
"\"(\\w+?\\s+\")"
, "\"(\\w+)(\\s+)\""
,"\"(\\w+)?(\\s+)\""
I would appreciate if someone could help me figure out how to constuct this.
如果有人能帮我弄清楚如何构建这个,我将不胜感激。
回答by ddr
You almost had it. Your regexes would match alphanumeric characters followed by spaces, like this:
你几乎拥有它。您的正则表达式将匹配字母数字字符后跟空格,如下所示:
"Whole "
"Whole "
but not any alphanumeric chars after that. zEro is almost right, but you probably want to use a capture like this:
但之后没有任何字母数字字符。zEro 几乎是正确的,但您可能想要使用这样的捕获:
"\"([\\w\\s]+)\""
"\"([\\w\\s]+)\""
This matches one or more [whitespace/alphanumeric] chars. Note that alphanumeric includes _
.
这匹配一个或多个 [whitespace/alphanumeric] 字符。请注意,字母数字包括_
.
If you want to be more general, you could use
如果你想更通用,你可以使用
"\"([^\"]+)\""
"\"([^\"]+)\""
which will match everythingbesides double quotes. For instance, "Who's on first?" (including the quotes) would be matched by the second regex but not by the first, since it includes punctuation.
它将匹配除双引号之外的所有内容。例如,“谁先上?” (包括引号)将与第二个正则表达式匹配,但不会与第一个匹配,因为它包含标点符号。
回答by Mena
The simplest way would be to have a while
loop looking for anything in between two quotes in your input, so you check for multiple quoted expressions.
最简单的方法是让while
循环查找输入中两个引号之间的任何内容,以便检查多个带引号的表达式。
My example here accepts anything in between two quotes. You can refine with only alphabetics and spaces.
我这里的示例接受两个引号之间的任何内容。您可以仅使用字母和空格进行优化。
String quotedTweet = "\"Whole Lotta Love\" - Led Zeppelin";
String unquotedTweet = "Whole Lotta Love from Led Zeppelin";
String multipleQuotes = "\"Whole Lotta Love\" - \"Led\" Zeppelin";
// commented Pattern for only alphabetics or spaces
// Pattern pattern = Pattern.compile("\"([\p{Alpha}\p{Space}]+?)\"");
Pattern pattern = Pattern.compile("\"(.+?)\"");
Matcher matcher = pattern.matcher(quotedTweet);
while (matcher.find()) {
// will find "Whole Lotta Love"
System.out.println(matcher.group(1));
}
matcher = pattern.matcher(unquotedTweet);
while (matcher.find()) {
// will find nothing
System.out.println(matcher.group(1));
}
matcher = pattern.matcher(multipleQuotes);
while (matcher.find()) {
// Will find "Whole Lotta Love" and "Led"
System.out.println(matcher.group(1));
}
Editthis example and the commented variant will not prevent quoted whitespace, as in " "
. Let me know if that's a requirement - the Pattern would be a bit more complicated in that case.
编辑此示例,注释变体将不会阻止引用的空格,如" "
. 让我知道这是否是一项要求 - 在这种情况下,模式会更复杂一些。
Output:
输出:
Whole Lotta Love
Whole Lotta Love
Led
回答by Casimir et Hippolyte
You can use this:
你可以使用这个:
\"(?>\w+ *)+\"
or a character class as zEro suggests it.
或 zEro 建议的字符类。