使用 bash 脚本添加/删除 xml 标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2561946/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 13:02:53  来源:igfitidea点击:

Add/remove xml tags using a bash script

xmlbashscripting

提问by sixtyfootersdude

I have an xml file that I want to configure using a bash script. For example if I had this xml:

我有一个要使用 bash 脚本配置的 xml 文件。例如,如果我有这个 xml:

<a>

  <b>
    <bb>
        <yyy>
            Bla 
        </yyy>
    </bb>
  </b>

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

(confidential info removed)

(已删除机密信息)

I would like to write a bash script that will remove section <b>(or comment it) but keep the rest of the xml intact. I am pretty new the the whole scripting thing. I was wondering if anyone could give me a hint as to what I should look into.

我想编写一个 bash 脚本,该脚本将删除部分<b>(或对其进行注释)但保持 xml 的其余部分完整无缺。我对整个脚本编写很新。我想知道是否有人可以给我一个关于我应该调查什么的提示。

I was thinking that sed could be used exceptsed is a line editor. I think it would be easy to remove the <b>tags however I am unsure if sed would be able to remove all the text betweenthe <b>tags.

我想除了sed 是行编辑器之外,可以使用sed 。我认为删除<b>标签很容易,但是我不确定 sed 是否能够删除标签之间的所有文本<b>

I will also need to write a script to add back the deleted section.

我还需要编写一个脚本来添加回已删除的部分。

回答by amertune

This would not be difficult to do in sed, as sed also works on ranges.

这在 sed 中不难做到,因为 sed 也适用于范围。

Try this (assuming xml is in a file named foo.xml):

试试这个(假设 xml 在一个名为 foo.xml 的文件中):

sed -i '/<b>/,/<\/b>/d' foo.xml

-i will write the change into the original file (use -i.bak to keep a backup copy of the original)

-i 将更改写入原始文件(使用 -i.bak 保留原始文件的备份副本)

This sed command will perform an action d (delete) on all of the lines specified by the range

此 sed 命令将对范围指定的所有行执行操作 d(删除)

# all of the lines between a line that matches <b>
# and the next line that matches <\/b>, inclusive
/<b>/,/<\/b>/

So, in plain English, this command will delete all of the lines between and including the line with <b> and the line with </b>

因此,用简单的英语,此命令将删除包含 <b> 的行和包含 </b> 的行之间的所有行

If you'd rather comment out the lines, try one of these:

如果您想注释掉这些行,请尝试以下方法之一:

# block comment
sed -i 's/<b>/<!-- <b>/; s/<\/b>/<\/b> -->/' foo.xml

# comment out every line in the range
sed -i '/<b>/,/<\/b>/s/.*/<!-- & -->/' foo.xml

回答by lmxy

Using xmlstarlet:

使用 xmlstarlet:

#xmlstarlet ed -d "/a/b" file.xml > tmp.xml
xmlstarlet ed -d "//b" file.xml > tmp.xml
mv tmp.xml file.xml

回答by Mads Hansen

You can use an XSLT such as this that is a modified identity transform. It copies all of the content by default, and has an empty template for bthat does nothing(effectively deleting from output):

您可以使用 XSLT,例如修改后的身份转换。它默认复制所有内容,并有一个空模板b,它什么都不做(有效地从输出中删除):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<!--Identity transform copies all items by default -->
<xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<!--Empty template to match on b elements and prevent it from being copied to output -->
<xsl:template match="b"/>

</xsl:stylesheet>

Create a bash script that executes the transform using Java and the Xalan commandline utilitylike this:

创建一个使用 Java 和 Xalan 命令行实用程序执行转换的 bash 脚本,如下所示:

java org.apache.xalan.xslt.Process -IN foo.xml -XSL foo.xsl -OUT foo.out

java org.apache.xalan.xslt.Process -IN foo.xml -XSL foo.xsl -OUT foo.out

The result is this:

结果是这样的:

<?xml version="1.0" encoding="UTF-16"?><a><c><cc>
      Something
    </cc></c><d>
    bla
  </d></a>


EDIT:if you would prefer to have the bcommented out, to make it easier to put back, then use this stylesheet:

编辑:如果您希望b注释掉,以便更容易放回原处,请使用此样式表:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <!--Identity transform copies all items by default -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!--Match on b element, wrap in a comment and construct text representing XML structure by applying templates in "comment" mode -->
    <xsl:template match="b">
        <xsl:comment>
            <xsl:apply-templates select="self::*" mode="comment" />
        </xsl:comment>
    </xsl:template>

    <xsl:template match="*" mode="comment">
        <xsl:value-of select="'&lt;'"/>
            <xsl:value-of select="name()"/>
        <xsl:value-of select="'&gt;'"/>
            <xsl:apply-templates select="@*|node()" mode="comment" />
        <xsl:value-of select="'&lt;/'"/>
            <xsl:value-of select="name()"/>
        <xsl:value-of select="'&gt;'"/>
    </xsl:template>

    <xsl:template match="text()" mode="comment">
        <xsl:value-of select="."/>
    </xsl:template>

    <xsl:template match="@*" mode="comment">
        <xsl:value-of select="name()"/>
        <xsl:text>="</xsl:text>
        <xsl:value-of select="."/>
        <xsl:text>" </xsl:text>
    </xsl:template>

</xsl:stylesheet>

It produces this output:

它产生这个输出:

<?xml version="1.0" encoding="UTF-16"?><a><!--<b><bb><yyy>
            Bla
        </yyy></bb></b>--><c><cc>
      Something
    </cc></c><d>
    bla
  </d></a>

回答by Teddy

If you want the most appropriate replacement for sedfor XML data, it would be an XSLT processor. Like sedit's a complex language but specialized for the task of XML-to-anything transformations.

如果您想要最合适sed的 XML 数据替代品,那就是 XSLT 处理器。就像sed它是一种复杂的语言,但专门用于 XML 到任何东西的转换任务。

On the other hand, this doesseem to be the point at which I would seriously consider switching to a real programming language, like Python.

在另一方面,这似乎是在我会认真考虑切换到一个真正的编程语言,像Python点。

回答by ghostdog74

@OP, you can use awk eg

@OP,您可以使用 awk 例如

$ cat file
<a>                              

some text before   <b>
    <bb>
        <yyy>
            Bla
        </yyy>
    </bb>
  </b> some text after

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

$ awk 'BEGIN{RS="</b>"}/<b>/{gsub(/<b>.*/,"")}1' file
<a>

some text before
 some text after

  <c>
    <cc>
      Something
    </cc>
  </c>

  <d>
    bla
  </d>
</a>

回答by notmp

# edit file inplace
xmlstarlet ed -L -d "//b" file.xml

回答by Hitesh

sed -i '/<b>/,/<\/b>/d' foo.xml

Will this work if b tag has a value defined as well in about HTML, b tag starts as <b id="Test Step">

如果 b 标签在 about HTML 中也定义了一个值,这是否有效,b 标签开始为 <b id="Test Step">