Java regex 去除 XML 标签，但不去除标签内容

Question

提问by IAmYourFaja

I have the following Java code:

我有以下 Java 代码：

str = str.replaceAll("<.*?>.*?</.*?>|<.*?/>", "");

This turns a String like so:

这会变成一个像这样的字符串：

How now <fizz>brown</fizz> cow.

Into:

进入：

How now  cow.

However, I want it to just strip the <fizz>and </fizz>tags, or just standalone </fizz> tags, and leave the element's content alone. So, a regex that would turn the above into:

但是，我希望它只是去掉<fizz>和</fizz>标签，或者只是独立的</fizz> 标签，并保留元素的内容。所以，一个正则表达式可以把上面的内容变成：

How now brown cow.

Or, using a more complex String, somethng that turns:

或者，使用更复杂的字符串，会变成：

How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow.

Into:

进入：

How now brown cow.

I tried this:

我试过这个：

str = str.replaceAll("<.*?></.*?>|<.*?/>", "");

And that doesn't work at all. Any ideas? Thanks in advance!

这根本行不通。有任何想法吗？提前致谢！

Answer 1

回答by Sam Barnum

"How now <fizz>brown</fizz> cow.".replaceAll("<[^>]+>", "")

Answer 2

回答by TheEwook

You were almost there ;)

你快到了 ;)

Try this:

试试这个：

str = str.replaceAll("<.*?>", "")

Answer 3

回答by Sergiu Toarca

While there are other correct answers, none give any explanation.

虽然还有其他正确答案，但没有一个给出任何解释。

The reason your regex <.*?>.*?</.*?>|<.*?/>doesn't work is because it will select any tags as well as everything inside them. You can see that in action on debuggex.

您的正则表达式<.*?>.*?</.*?>|<.*?/>不起作用的原因是它会选择任何标签以及其中的所有内容。您可以在debuggex上看到这一点。

The reason your second attempt <.*?></.*?>|<.*?/>doesn't work is because it will select from the beginning of a tag up to the first close tag following a tag. That is kind of a mouthful, but you can understand better what's going on in this example.

您的第二次尝试<.*?></.*?>|<.*?/>不起作用的原因是因为它将从 tag 的开头到 tag之后的第一个结束标记进行选择。这有点啰嗦，但您可以更好地理解本示例中发生的事情。

The regex you need is much simpler: <.*?>. It simply selects every tag, ignoring if it's open/close. Visualization.

您需要的正则表达式要简单得多：<.*?>. 它只是选择每个标签，忽略它是否打开/关闭。可视化。

Answer 4

回答by Sarath Kumar Sivan

You can try this too:

你也可以试试这个：

str = str.replaceAll("<.*?>", "");

Please have a look at the below example for better understanding:

请查看以下示例以更好地理解：

public class StringUtils {

    public static void main(String[] args) {
        System.out.println(StringUtils.replaceAll("How now <fizz>brown</fizz> cow."));
        System.out.println(StringUtils.replaceAll("How <buzz>now <fizz>brown</fizz><yoda/></buzz> cow."));
    }

    public static String replaceAll(String strInput) {
        return strInput.replaceAll("<.*?>", "");
    }
}

Output:

输出：

How now brown cow.
How now brown cow.

Answer 5

回答by Gayathry

This isn't elegant, but it is easy to follow. The below code removes the start and end XML tags if they are present in a line together

这并不优雅，但很容易遵循。下面的代码删除开始和结束 XML 标记（如果它们一起出现在一行中）

<url>"www.xml.com"<\url> , <body>"This is xml"<\body>

Regex :

正则表达式：

to_replace='<\w*>|<\/\w*>',value=""

Answer 6

回答by Devarsh Modi

If you want to parse XML log file so you can do with regex {java}, <[^<]+<.so you get <name>DEV</name>. Output like name>DEV. You have to just play with REGEX.

如果你想解析 XML 日志文件，这样你就可以使用正则表达式 {java} <[^<]+<，.so 你得到<name>DEV</name>. 输出如名称> DEV。你只需要玩 REGEX。

Java regex 去除 XML 标签，但不去除标签内容

提问by IAmYourFaja

回答by Sam Barnum

回答by TheEwook

回答by Sergiu Toarca

回答by Sarath Kumar Sivan

回答by Gayathry

回答by Devarsh Modi

相关推荐

最近更新

标签

Java regex 去除 XML 标签，但不去除标签内容

提问by IAmYourFaja

回答by Sam Barnum

回答by TheEwook

回答by Sergiu Toarca

回答by Sarath Kumar Sivan

回答by Gayathry

回答by Devarsh Modi

相关推荐

java 将 Object 的实例转换为原始类型或对象类型

如何在使用 createArrayOf() 方法时纠正“java.sql.SQLFeatureNotSupportedException”

java 从文本文件中读取数据的java程序

java org.apache.maven.plugin.MojoExecutionException：协议失败

相关推荐

最近更新

标签