使用 Java Regex 删除 XML 字符串中的 XML 标记和内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42894585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove XML Tag and Content in XML String using Java Regex
提问by vkrams
I have a XML String of 400 lines and it does consists of below tags repeated twice. I want to remove those tags
我有一个 400 行的 XML 字符串,它确实包含重复两次的以下标签。我想删除那些标签
<Address>
<Location>Beach</Location>
<Dangerous>
<Flag>N</Flag>
</Dangerous>
</Address>
I am using the below regex pattern but it's not replacing
我正在使用下面的正则表达式模式,但它没有取代
xmlRequest.replaceAll("<Address>.*?</Address>$","");
xmlRequest.replaceAll("<Address>.*?</Address>$","");
I can able to do this in Notepad ++ by selecting [x].matches newline
checkbox next to Regular Expression
radio button in Find/Replace dialog box
通过在“查找/替换”对话框中[x].matches newline
选中Regular Expression
单选按钮旁边的复选框,我可以在 Notepad ++ 中执行此操作
Can anyone suggest what's wrong with my regular expression
谁能建议我的正则表达式有什么问题
回答by Kerwin
xmlRequest.replaceAll("<Address>[\s\S]*?</Address>","");
.* don't contains the \n\r , so need use [\s\S] to match all
.* 不包含 \n\r ,所以需要使用 [\s\S] 来匹配所有
回答by Raju
A solution with JSoup
JSoup 的解决方案
public static void main(String[] args){
String XmlContent="<Address> <Location>Beach</Location><Dangerous>
<Flag>N</Flag> </Dangerous> </Address>";
String tagToReplace="Address";
String newValue="";
Document doc = Jsoup.parse(XmlContent);
ArrayList<Element> els =doc.getElementsByTag(tagToReplace);
for(int i=0;i<els.size();i++){
Element el = els.get(i);
el.remove();
}
XmlContent=doc.body().children().toString();
}
回答by b4n4n4p4nd4
As improper as it may be to do what you're suggesting. (See https://stackoverflow.com/a/1732454/6552039for hilarity and enlightenment.)
尽管按照你的建议去做可能是不合适的。(有关欢闹和启蒙,请参阅https://stackoverflow.com/a/1732454/6552039。)
You should be able to just ingest your xml with a org.w3c.dom.Document parser, then do a getElementsByTagName("Address"), and have it .remove(Element) the second one. (Assuming a particular interpretation of "below tags repeated twice".
您应该能够使用 org.w3c.dom.Document 解析器摄取您的 xml,然后执行 getElementsByTagName("Address"),并将其 .remove(Element) 作为第二个。(假设对“下面的标签重复两次”有特定的解释。