java java问题中的正则表达式,多个匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/465979/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 12:31:00  来源:igfitidea点击:

Regex in java question, multiple matches

javaregex

提问by Berlin Brown

I am trying to match multiple CSS style code blocks in a HTML document. This code will match the first one but won't match the second. What code would I need to match the second. Can I just get a list of the groups that are inside of my 'style' brackets? Should I call the 'find' method to get the next match?

我正在尝试匹配 HTML 文档中的多个 CSS 样式代码块。此代码将匹配第一个但不匹配第二个。我需要什么代码来匹配第二个。我可以只获取我的“样式”括号内的组列表吗?我应该调用“查找”方法来获取下一个匹配项吗?

Here is my regex pattern

这是我的正则表达式模式

^.*(<style type="text/css">)(.*)(</style>).*$

Usage:

用法:

final Pattern pattern_css = Pattern.compile(css_pattern_buf.toString(), 
                    Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);

 final Matcher match_css = pattern_css.matcher(text);
        if (match_css.matches() && (match_css.groupCount() >= 3)) {
            System.out.println("Woot ==>" + match_css.groupCount());
            System.out.println(match_css.group(2));
        } else {
            System.out.println("No Match");
        }

回答by bobince

I am trying to match multiple CSS style code blocks in a HTML document.

我正在尝试匹配 HTML 文档中的多个 CSS 样式代码块。

Standard Answer: don't use regex to parse HTML. regex cannot parse HTML reliably, no matter how complicated and clever you make your expression. Unless you are absolutely sure the exact format of the target document is totally fixed, string or regex processing is insufficient and you must use an HTML parser.

标准答案:不要使用正则表达式来解析 HTML。regex 无法可靠地解析 HTML,无论您的表达式多么复杂和聪明。除非您完全确定目标文档的确切格式是完全固定的,否则字符串或正则表达式处理是不够的,您必须使用 HTML 解析器。

(<style type="text/css">)(.*)(</style>)

That's a greedy expression. The (.*) in the middle will match as much as it possibly can. If you have two style blocks:

那是贪婪的表达。中间的 (.*) 将尽可能多地匹配。如果您有两个样式块:

<style type="text/css">1</style> <style type="text/css">2</style>

then it will happily match '1</style> <style type="text/css">2'.

然后它会很高兴地匹配'1</style> <style type="text/css">2'。

Use (.*?) to get a non-greedy expression, which will allow the trailing (</style>) to match at the first opportunity.

使用 (.*?) 获得非贪婪表达式,这将允许尾随 (</style>) 在第一次机会匹配。

Should I call the 'find' method to get the next match?

我应该调用“查找”方法来获取下一个匹配项吗?

Yes, and you should have used it to get the first match too. The usual idiom is:

是的,你也应该用它来获得第一场比赛。通常的成语是:

while (matcher.find()) {
    s= matcher.group(n);
}

Note that standard string processing (indexOf, etc) may be a simpler approach for you than regex, since you're only using completely fixed strings. However, the Standard Answer still applies.

请注意,标准字符串处理(indexOf 等)对您来说可能比正则表达式更简单,因为您只使用完全固定的字符串。但是,标准答案仍然适用。

回答by Gumbo

You can simplify the regex as follows:

您可以按如下方式简化正则表达式:

(<style type="text/css">)(.*?)(</style>)

And if you don't need the groups 1 and 3 (probably not), I would drop the parentheses, remaining only:

如果你不需要第 1 组和第 3 组(可能不需要),我会去掉括号,只剩下:

<style type="text/css">(.*?)</style>