bash 从 xml 文件中删除注释并漂亮地打印它

Question

提问by elcuco

I have this huge xml file which contains a lot of comments.

我有这个巨大的 xml 文件，其中包含很多注释。

Whats the "best way" to strip out all the comments and nicely format the xml from the linux command line?

删除所有注释并从 linux 命令行很好地格式化 xml 的“最佳方法”是什么？

Answer 1

回答by alexgirao

you can use tidy

你可以使用整洁

$ tidy -quiet -asxml -xml -indent -wrap 1024 --hide-comments 1 tomcat-users.xml
<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
  <user username="qwerty" password="ytrewq" roles="manager-gui" />
</tomcat-users>

Answer 2

回答by Mads Hansen

Run your XML through an identity transformXSLT, with an empty template for comments.

通过身份转换XSLT运行您的 XML ，并使用一个空的注释模板。

All of the XML content, except for the comments, will be passed through to the output.

除注释外的所有 XML 内容都将传递到输出。

In order to niecely format the output, set the output @indent="yes":

为了对输出进行格式化，设置输出@indent="yes"：

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<!--Match on Attributes, Elements, text nodes, and Processing Instructions-->
<xsl:template match="@*| * | text() | processing-instruction()">
   <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
</xsl:template>

<!--Empty template prevents comments from being copied into the output -->
<xsl:template match="comment()"/>

</xsl:stylesheet>

Answer 3

回答by Daren Thomas

You might want to look at the xmllinttool. It has several options (one of which --formatwill do a pretty print), but I can't figure out how to remove the comments using this tool.

您可能想查看该xmllint工具。它有几个选项（其中一个--format可以打印漂亮的内容），但我不知道如何使用此工具删除注释。

Also, check out XMLStarlet, a bunch of command line tools to do anything you would want to with xml. Then do:

此外，请查看XMLStarlet，这是一组命令行工具，可以使用 xml 执行任何您想做的事情。然后做：

xml c14n --without-comments # XML file canonicalization w/o comments

EDIT: OP eventually used this line:

编辑：OP 最终使用了这一行：

xmlstarlet c14n --without-comments old.xml > new.xml

Answer 4

回答by Alex Pakka

To tidy up something simple like Tomcat's server.xml, I use

为了整理一些简单的东西，比如 Tomcat 的 server.xml，我使用

sed 's/<!--/\x0<!--/g;s/-->/-->\x0/g' | grep -zv '^<!--' | tr -d 'function tidy() {
 echo "$( cat  | sed 's/<!--/\x0<!--/g;s/-->/-->\x0/g' | grep -zv '^<!--' | tr -d '##代码##' | grep -v "^\s*$")"
}

tidy server.xml
' | grep -v "^\s*$"

I.e.

IE

##代码##

... will print the xml without comments.

...将打印没有注释的 xml。

NOTE: while it works reasonably well for simple things, it will fail with certain CDATA blocks and some other situations. Only use it for controlled xml scripts that have no need and will never need to escape a single <--or -->anywhere!

注意：虽然它对于简单的事情工作得相当好，但它会在某些 CDATA 块和其他一些情况下失败。仅将其用于不需要且永远不需要转义单个<--或-->任何地方的受控 xml 脚本！

First sed marks comment's start and stop with 0x0 characters, then grep with -ztreats 0x0 as the only line delimiter, searches for lines starting with comment, it's -v inverts the filter, leaving only meaningful lines. Finally, tr -d\0` deletes all these 0x0, and to polish it up, another grep removes empty lines: voila.

首先 sed 用 0x0 字符标记注释的开始和停止，然后 grep with-z将 0x0 视为唯一的行分隔符，搜索以注释开头的行，它的 -v 反转过滤器，只留下有意义的行。最后，tr -d\0` 删除了所有这些 0x0，并对其进行了润色，另一个 grep 删除了空行：瞧。

Answer 5

回答by ire_and_curses

The best way would be to use an XML parser to handle all the obscure corner cases correctly. But if you need something quick and dirty, there are a variety of short solutions using Perl regexeswhich may be sufficient.

最好的方法是使用 XML 解析器正确处理所有晦涩的极端情况。但是如果你需要一些快速而肮脏的东西，有各种使用 Perl 正则表达式的简短解决方案可能就足够了。

bash 从 xml 文件中删除注释并漂亮地打印它

提问by elcuco

回答by alexgirao

回答by Mads Hansen

回答by Daren Thomas

回答by Alex Pakka

回答by ire_and_curses

相关推荐

最近更新

标签

bash 从 xml 文件中删除注释并漂亮地打印它

提问by elcuco

回答by alexgirao

回答by Mads Hansen

回答by Daren Thomas

回答by Alex Pakka

回答by ire_and_curses

相关推荐

bash 如何从 unix/linux 上的文件中获取任意块

bash 从某一行开始对文件进行排序

删除所有文件但将所有目录保留在 bash 脚本中？

bash shell 脚本编译器

相关推荐

最近更新

标签