Java:String.replace(regex, string) 从 XML 中删除内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6494416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java: String.replace(regex, string) to remove content from XML
提问by TookTheRook
Lets say I have an XML in the form of a string. I wish to remove the content between two tags within the XML String, say . I have tried:
假设我有一个字符串形式的 XML。我希望删除 XML 字符串中两个标签之间的内容,比如 . 我努力了:
String newString = oldString.replaceFirst("\<tagName>.*?\<//tagName>",
"Content Removed");
but it does not work. Any pointers as to what am I doing wrong?
但它不起作用。关于我做错了什么的任何指示?
回答by Sean Patrick Floyd
OK, apart from the obvious answer (don't parse XML with regex), maybe we can fix this:
好的,除了显而易见的答案(不要用正则表达式解析 XML),也许我们可以解决这个问题:
String newString = oldString.replaceFirst("(?s)<tagName[^>]*>.*?</tagName>",
"Content Removed");
Explanation:
解释:
(?s) # turn single-line mode on (otherwise '.' won't match '\n')
<tagName # remove unnecessary (and perhaps erroneous) escapes
[^>]* # allow optional attributes
>.*?</tagName>
Are you sure your matching the tag case correctly? Perhaps you also want to add the i
flag to the pattern: (?si)
您确定您的标签大小写正确吗?也许您还想将i
标志添加到模式中:(?si)
回答by Pablo Fernandez
Probably the problem lies here:
问题大概出在这里:
<//tagName>
<//tagName>
Try changing it to
尝试将其更改为
<\/tagName>
<\/tagName>
回答by SJuan76
XML is a grammar; regular expressions are not the best tools to work with grammars.
XML 是一种语法;正则表达式不是处理语法的最佳工具。
My advice would be working with a real parser to work with the DOM instead of doing matches
我的建议是使用真正的解析器来处理 DOM 而不是进行匹配
For example, if you have:
例如,如果您有:
<xml>
<items>
<myItem>
<tagtoRemove>something1</tagToRemove>
</myItem>
<myItem>
<tagtoRemove>something2</tagToRemove>
</myItem>
</items>
A regex could try to match it (due to the greedy mechanism)
正则表达式可以尝试匹配它(由于贪婪机制)
<xml>
<items>
<myItem>
matchString
</myItem>
</items>
Also, some uses that some DTDs may allow (such as <tagToRemove/>
or <tagToRemove attr="value">
) make catching tags with regex more difficult.
此外,某些 DTD 可能允许的某些用途(例如<tagToRemove/>
或<tagToRemove attr="value">
)使使用正则表达式捕获标签变得更加困难。
Unless it is very clear to you that none of the above may occur (nor or in the future) I would go with a parser.
除非您很清楚上述情况都不会发生(也不会发生),否则我会使用解析器。