Java Regex:如何在同一行中捕获多个匹配项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7470394/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Regex: how to capture multiple matches in the same line
提问by Amit
I am trying to match a regex pattern in Java, and I have two questions:
我正在尝试匹配 Java 中的正则表达式模式,我有两个问题:
- Inside the pattern I'm looking for there is a known beginning and then an unknown string that I want to get up until the first occurrence of an &.
- there are multiple occurrences of these patterns in the line and I would like to get each occurrence separately.
- 在我正在寻找的模式中,有一个已知的开头,然后是一个未知的字符串,我想一直到第一次出现 &。
- 这些模式在该行中多次出现,我想分别获得每个出现。
For example I have this input line:
例如我有这个输入行:
1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ&sName=View+All&subCatView=true 0 2819357575609397706
And I am interested in these strings:
我对这些字符串感兴趣:
Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
Screen+Refresh+Rate%7C120HZ
回答by Dan Cruz
Assuming the known beginning is filter=**
, the regular expression pattern (?:filter=\\*\\*)(.*?)(?:&)
should get you what you need. Use Matcher.find()
to get all occurrences of the pattern in a given string. Using the test string you provided, the following:
假设已知的开头是filter=**
,正则表达式模式(?:filter=\\*\\*)(.*?)(?:&)
应该可以满足您的需求。使用Matcher.find()
来获取所有匹配模式给定的字符串中。使用您提供的测试字符串,执行以下操作:
final Pattern p = Pattern.compile("(?:filter=\*\*)(.*?)(?:&)");
final Matcher m = p.matcher(testString);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": G1: " + m.group(1));
}
Will output:
将输出:
1: G1: Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
2: G1: Screen+Refresh+Rate%7C120HZ**
回答by Sahil Muthoo
If i know that I might need other query parameters in the future, I think it'll be more prudent to decode and parse the URL.
如果我知道我将来可能需要其他查询参数,我认为解码和解析 URL 会更加谨慎。
String url = URLDecoder.decode("http://www.gold.com/shc/s/c_10153_12605_" +
"Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate" +
"%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true"
,"utf-8");
Pattern amp = Pattern.compile("&");
Pattern eq = Pattern.compile("=");
Map<String, String> params = new HashMap<String, String>();
String queryString = url.substring(url.indexOf('?') + 1);
for(String param : amp.split(queryString)) {
String[] pair = eq.split(param);
params.put(pair[0], pair[1]);
}
for(Entry<String, String> param : params.entrySet()) {
System.out.format("%s = %s\n", param.getKey(), param.getValue());
}
Output
输出
subCatView = true
viewItems = 25
sName = View All
filter = Screen Refresh Rate|120HZ^Screen Size|37 in. to 42 in.
回答by jtahlborn
in your example, there is sometimes a "**" at the end before the "&". but basically, (assuming "filter=" is the start pattern you are looking for) you want something like:
在您的示例中,有时在“&”之前的末尾有一个“**”。但基本上,(假设“过滤器=”是您正在寻找的开始模式)您想要类似的东西:
"filter=([^&]+)&"
"filter=([^&]+)&"
回答by beny23
Using the regular expression (?<=filter=\*{0,2})[^&]*[^&*]+
in java:
(?<=filter=\*{0,2})[^&]*[^&*]+
在java中使用正则表达式:
Pattern p = Pattern.compile("(?<=filter=\*{0,2})[^&]*[^&*]+");
String s = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
EDIT:
编辑:
Added [^&*]+
to the end of the regex to prevent the **
from being included in the second match.
添加[^&*]+
到正则表达式的末尾以防止将**
包含在第二个匹配中。
EDIT2:
编辑2:
Changed regular expression to use lookbehind.
将正则表达式更改为使用后视。
回答by NPE
The regex you're looking for is
您正在寻找的正则表达式是
Screen\+Refresh\+Rate[^&]*
You could use Matcher.find()
to find all matches.
您可以Matcher.find()
用来查找所有匹配项。
回答by ouotuo
are you looking for a string that follows with "filter=" and ignores the first "*" and is end with the first "&". your can try the following:
您是否正在寻找后面跟有“filter=”并忽略第一个“*”并以第一个“&”结尾的字符串。您可以尝试以下操作:
String str = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Pattern p = Pattern.compile("filter=(?:\**)([^&]+?)(?:\**)&");
Matcher matcher = p.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}