Java 字符串 - 获取(但不包括)两个正则表达式之间的所有内容?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/962122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 21:37:21  来源:igfitidea点击:

Java string - get everything between (but not including) two regular expressions?

javaregexstringsplit

提问by

In Java, is there a simple way to extract a substring by specifying the regular expression delimiters on either side, without including the delimiters in the final substring?

在 Java 中,是否有一种简单的方法可以通过在任一侧指定正则表达式分隔符来提取子字符串,而不在最终子字符串中包含分隔符?

For example, if I have a string like this:

例如,如果我有一个这样的字符串:

<row><column>Header text</column></row>

what is the easiest way to extract the substring:

提取子字符串的最简单方法是什么:

Header text

Please note that the substring may contain line breaks...

请注意,子字符串可能包含换行符...

thanks!

谢谢!

采纳答案by Aaron Maenpaa

Write a regex like this:

像这样写一个正则表达式:

"(regex1)(.*)(regex2)"

... and pull out the middle group from the matcher (to handle newlines in your pattern you want to use Pattern.DOTALL).

...并从匹配器中拉出中间组(以处理您想要使用Pattern.DOTALL 的模式中的换行符)。

Using your example we can write a program like:

使用您的示例,我们可以编写如下程序:

package test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex {

    public static void main(String[] args) {
        Pattern p = Pattern.compile(
                "<row><column>(.*)</column></row>",
                Pattern.DOTALL
            );

        Matcher matcher = p.matcher(
                "<row><column>Header\n\n\ntext</column></row>"
            );

        if(matcher.matches()){
            System.out.println(matcher.group(1));
        }
    }

}

Which when run prints out:

运行时打印出:

Header


text

回答by Thorbj?rn Ravn Andersen

You should not use regular expressions to decode XML - this will eventually break if the input is not strictly controlled.

您不应该使用正则表达式来解码 XML - 如果不严格控制输入,这最终会中断。

The easiest thing is probably to parse the XML up in a DOM tree (Java 1.4 and newer contain a XML parser directly) and then navigate the tree to pick out what you need.

最简单的方法可能是在 DOM 树中解析 XML(Java 1.4 和更新版本直接包含 XML 解析器),然后在树中导航以挑选出您需要的内容。

Perhaps you would like to tell what you want to accomplish with your program?

也许您想告诉您想用您的程序完成什么?