Java:String.replace(regex, string) 从 XML 中删除内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6494416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 16:05:47  来源:igfitidea点击:

Java: String.replace(regex, string) to remove content from XML

javaxmlregex

提问by TookTheRook

Lets say I have an XML in the form of a string. I wish to remove the content between two tags within the XML String, say . I have tried:

假设我有一个字符串形式的 XML。我希望删除 XML 字符串中两个标签之间的内容,比如 . 我努力了:

String newString = oldString.replaceFirst("\<tagName>.*?\<//tagName>",
                                                              "Content Removed");

but it does not work. Any pointers as to what am I doing wrong?

但它不起作用。关于我做错了什么的任何指示?

回答by Sean Patrick Floyd

OK, apart from the obvious answer (don't parse XML with regex), maybe we can fix this:

好的,除了显而易见的答案(不要用正则表达式解析 XML),也许我们可以解决这个问题:

String newString = oldString.replaceFirst("(?s)<tagName[^>]*>.*?</tagName>",
                                          "Content Removed");

Explanation:

解释:

(?s)             # turn single-line mode on (otherwise '.' won't match '\n')
<tagName         # remove unnecessary (and perhaps erroneous) escapes
[^>]*            # allow optional attributes
>.*?</tagName>   

Are you sure your matching the tag case correctly? Perhaps you also want to add the iflag to the pattern: (?si)

您确定您的标签大小写正确吗?也许您还想将i标志添加到模式中:(?si)

回答by Pablo Fernandez

Probably the problem lies here:

问题大概出在这里:

<//tagName>

<//tagName>

Try changing it to

尝试将其更改为

<\/tagName>

<\/tagName>

回答by SJuan76

XML is a grammar; regular expressions are not the best tools to work with grammars.

XML 是一种语法;正则表达式不是处理语法的最佳工具。

My advice would be working with a real parser to work with the DOM instead of doing matches

我的建议是使用真正的解析器来处理 DOM 而不是进行匹配

For example, if you have:

例如,如果您有:

<xml>
 <items>
  <myItem>
     <tagtoRemove>something1</tagToRemove>
  </myItem>
  <myItem>
     <tagtoRemove>something2</tagToRemove>
  </myItem>
 </items>

A regex could try to match it (due to the greedy mechanism)

正则表达式可以尝试匹配它(由于贪婪机制)

<xml>
 <items>
  <myItem>
     matchString
  </myItem>
 </items>

Also, some uses that some DTDs may allow (such as <tagToRemove/>or <tagToRemove attr="value">) make catching tags with regex more difficult.

此外,某些 DTD 可能允许的某些用途(例如<tagToRemove/><tagToRemove attr="value">)使使用正则表达式捕获标签变得更加困难。

Unless it is very clear to you that none of the above may occur (nor or in the future) I would go with a parser.

除非您很清楚上述情况都不会发生(也不会发生),否则我会使用解析器。