在 Java 中使用正则表达式多次匹配一个字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18751486/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Matching one string multiple times using regex in Java
提问by foglerek
I'm having some issues with making the following regex work. I would like the following string:
我在使以下正则表达式工作时遇到了一些问题。我想要以下字符串:
"Please enter your name here"
to result in an array with the following elements:
生成具有以下元素的数组:
'please enter', 'enter your', 'your name', 'name here'
Currently, I'm using the following pattern, and then creating a matcher and iterating in the following way:
目前,我正在使用以下模式,然后创建一个匹配器并按以下方式进行迭代:
Pattern word = Pattern.compile("[\w]+ [\w]+");
Matcher m = word.matcher("Please enter your name here");
while (m.find()) {
wordList.add(m.group());
}
But the result I'm getting is:
但我得到的结果是:
'please enter', 'your name'
What am I doing wrong? (P.s., i checked the same regex on regexpal.com and had the same problem). It seems like the same word won't be matched twice. What can I do to achieve the result I want?
我究竟做错了什么?(Ps,我在 regexpal.com 上检查了相同的正则表达式并遇到了同样的问题)。似乎同一个词不会匹配两次。我该怎么做才能达到我想要的结果?
Thanks.
谢谢。
---------------------------------
---------------------------------
EDIT:Thanks for all the suggestions! I ended up doing this (because it adds flexibility in being able to easily specify number of "n-grams"):
编辑:感谢所有的建议!我最终这样做了(因为它增加了能够轻松指定“n-gram”数量的灵活性):
Integer nGrams = 2;
String patternTpl = "\b[\w']+\b";
String concatString = "what is your age? please enter your name."
for (int i = 0; i < nGrams; i++) {
// Create pattern.
String pattern = patternTpl;
for (int j = 0; j < i; j++) {
pattern = pattern + " " + patternTpl;
}
pattern = "(?=(" + pattern + "))";
Pattern word = Pattern.compile(pattern);
Matcher m = word.matcher(concatString);
// Iterate over all words and populate wordList
while (m.find()) {
wordList.add(m.group(1));
}
}
This results in:
这导致:
Pattern:
(?=(\b[\w']+\b)) // In the first iteration
(?=(\b[\w']+\b \b[\w']+\b)) // In the second iteration
Array:
[what, is, your, age, please, enter, your, name, what is, is your, your age, please enter, enter your, your name]
Note: Got the pattern from the following top answer: Java regex skipping matches
注意:从以下最佳答案中获得模式:Java regex skipping matching
采纳答案by arshajii
The matches can't overlap, which explains your result. Here's a potential workaround, making use of capturing groupswith a positive lookahead:
匹配项不能重叠,这说明了您的结果。这是一个潜在的解决方法,利用具有积极前瞻性的捕获组:
Pattern word = Pattern.compile("(\w+)(?=(\s\w+))");
Matcher m = word.matcher("Please enter your name here");
while (m.find()) {
System.out.println(m.group(1) + m.group(2));
}
Please enter enter your your name name here
回答by Willem Van Onsem
You're not doing anything wrong. It's just the way a regex works (otherwise matching would become O(n^2), since regex matching is done in linear time, this cannot be processed).
你没有做错任何事。这只是正则表达式的工作方式(否则匹配将变为 O(n^2),因为正则表达式匹配是在线性时间内完成的,因此无法处理)。
In this case you could simply search for [\w]+
. And postprocess these groups.
在这种情况下,您可以简单地搜索[\w]+
. 并对这些组进行后处理。
回答by ajb
Something like:
就像是:
Pattern word = Pattern.compile("(\w+) ?");
Matcher m = word.matcher("Please enter your name here");
String previous = null;
while (m.find()) {
if (previous != null)
wordList.add(previous + m.group(1));
previous = m.group();
}
The pattern ends with an optional space (which matches if there are more spaces in the string). m.group()
returns the entire match, with the space; m.group(1)
returns just the word, without the space.
该模式以一个可选的空格结尾(如果字符串中有更多空格,则匹配)。 m.group()
返回整个匹配,带空格;m.group(1)
只返回单词,不带空格。
回答by Josh M
If you want to avoid using such specific RegEx, perhaps you should try a simpler, and more easier, solution:
如果您想避免使用此类特定的 RegEx,也许您应该尝试一种更简单、更容易的解决方案:
public static String[] array(final String string){
final String[] words = string.split(" ");
final String[] array = new String[words.length-1];
for(int i = 0; i < words.length-1; i++)
array[i] = String.format("%s %s", words[i], words[i+1]);
return array;
}
public static void main(String args[]){
final String[] array = array("Please enter your name here");
System.out.println(Arrays.toString(array));
}
The output is:
输出是:
[Please enter, enter your, your name, name here]
[Please enter, enter your, your name, name here]