使用 Java Regex 删除 XML 字符串中的 XML 标记和内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42894585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 06:56:21  来源:igfitidea点击:

Remove XML Tag and Content in XML String using Java Regex

javaregexxml

提问by vkrams

I have a XML String of 400 lines and it does consists of below tags repeated twice. I want to remove those tags

我有一个 400 行的 XML 字符串,它确实包含重复两次的以下标签。我想删除那些标签

<Address>
<Location>Beach</Location>
<Dangerous>
    <Flag>N</Flag>
</Dangerous>
</Address>

I am using the below regex pattern but it's not replacing

我正在使用下面的正则表达式模式,但它没有取代

xmlRequest.replaceAll("<Address>.*?</Address>$","");

xmlRequest.replaceAll("<Address>.*?</Address>$","");

I can able to do this in Notepad ++ by selecting [x].matches newlinecheckbox next to Regular Expressionradio button in Find/Replace dialog box

通过在“查找/替换”对话框中[x].matches newline选中Regular Expression单选按钮旁边的复选框,我可以在 Notepad ++ 中执行此操作

Can anyone suggest what's wrong with my regular expression

谁能建议我的正则表达式有什么问题

回答by Kerwin

xmlRequest.replaceAll("<Address>[\s\S]*?</Address>","");

.* don't contains the \n\r , so need use [\s\S] to match all

.* 不包含 \n\r ,所以需要使用 [\s\S] 来匹配所有

回答by Raju

A solution with JSoup

JSoup 的解决方案

public static void main(String[] args){
    String XmlContent="<Address> <Location>Beach</Location><Dangerous> 
        <Flag>N</Flag> </Dangerous> </Address>";

    String tagToReplace="Address";
    String newValue="";

    Document doc = Jsoup.parse(XmlContent);
    ArrayList<Element> els =doc.getElementsByTag(tagToReplace);
    for(int i=0;i<els.size();i++){
        Element el = els.get(i);
        el.remove();
    }
    XmlContent=doc.body().children().toString();
}

回答by b4n4n4p4nd4

As improper as it may be to do what you're suggesting. (See https://stackoverflow.com/a/1732454/6552039for hilarity and enlightenment.)

尽管按照你的建议去做可能是不合适的。(有关欢闹和启蒙,请参阅https://stackoverflow.com/a/1732454/6552039。)

You should be able to just ingest your xml with a org.w3c.dom.Document parser, then do a getElementsByTagName("Address"), and have it .remove(Element) the second one. (Assuming a particular interpretation of "below tags repeated twice".

您应该能够使用 org.w3c.dom.Document 解析器摄取您的 xml,然后执行 getElementsByTagName("Address"),并将其 .remove(Element) 作为第二个。(假设对“下面的标签重复两次”有特定的解释。

回答by saka1029

Try with Jsoup.

尝试使用Jsoup

String str = "<Address>\n"
    + "<Location>Beach</Location>\n"
    + "<Dangerous>\n"
    + "    <Flag>N</Flag>\n"
    + "</Dangerous>\n"
    + "</Address>\n";
Document doc = Jsoup.parse(str);
System.out.println(doc.text());

output:

输出:

Beach N