bash 删除 xml 文件中两个标签之间的 EOL 和空格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15930960/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
remove EOLs and spaces between two tags in xml files
提问by Tony Morris
I have a lot (more than 50) xml files with some lines (more than 30/40 per files) with incorrect formatting :
我有很多(超过 50 个)xml 文件,其中一些行(每个文件超过 30/40 个)格式不正确:
example, this:
例如,这个:
<TAG1>
<TAG_TO_FORMAT>
a_random_string
</TAG_TO_FORMAT>
<AN_OTHER_TAG_TO_FORMAT>
an_other_random_string
</AN_OTHER_TAG_TO_FORMAT>
<OTHER_TAG>pifpafpouf</OTHER_TAG>
</TAG1>
should be transform into this:
应该改成这样:
<TAG1>
<TAG_TO_FORMAT>a_random_string</TAG_TO_FORMAT>
<AN_OTHER_TAG_TO_FORMAT>an_other_random_string</AN_OTHER_TAG_TO_FORMAT>
<OTHER_TAG>pifpafpouf</OTHER_TAG>
</TAG1>
it doesn't matter if the new line before </TAG1>is still present, my key problem is that each pattern : <TAG>random_string</TAG>must be on one line (the random_string does not contain an EOL)
之前的新行</TAG1>是否仍然存在并不重要,我的关键问题是每个模式:<TAG>random_string</TAG>必须在一行上(random_string 不包含 EOL)
I couldn't find any tool in bash allowing me to perform this, so how could i do this in bash ? (or maybe in python but i would prefer bash).
我在 bash 中找不到任何允许我执行此操作的工具,那么我如何在 bash 中执行此操作?(或者可能在 python 中,但我更喜欢 bash)。
回答by Magnus
There are command line tools like xmllint and tidy that can be used like this:
有像 xmllint 和 tidy 这样的命令行工具可以这样使用:
tidy -xml -iq somefile.xml
In theory xmllint can also do it, but xmllint doesnt work as described for me on OS X (dont have a Linux instance handy to test there at the moment):
理论上 xmllint 也可以做到,但是 xmllint 在 OS X 上不能像我描述的那样工作(目前没有 Linux 实例可以方便地在那里测试):
xmllint --noblanks somefile.xml
回答by Brian Chrisman
Tidy does a reasonable job. Another option is to throw on an xslt transform calling normalize-space()
Tidy 做得很合理。另一种选择是调用 normalize-space() 进行 xslt 转换
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml"/>
<xsl:template match="@*|node()|/">
<xsl:copy>
<xsl:apply-templates select="@*|node()">
<xsl:sort select="@kname"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="normalize-space(text())"/>
</xsl:template>
and I'd save that into a file and if from command line run
我会把它保存到一个文件中,如果从命令行运行
xsltproc normalize-space.xsl file.xml
or in a pipeline
或在管道中
run_some_command | xsltproc normalize-space.xsl - | xmllint --format -
xmllint --noblanks does not characterize all the space characters I want necessarily as 'ignorable'. It's almost certainly technically correct, but not what I want.
xmllint --noblanks 并没有将我想要的所有空格字符都定性为“可忽略”。这在技术上几乎可以肯定是正确的,但不是我想要的。
回答by Ansgar Wiechers
I'd recommend Perl for this kind of task.
我会推荐 Perl 来完成这种任务。
#!/usr/bin/env perl
use strict;
use warnings;
my $text = join "", <>;
$text =~ s/>\s+([^\s].*?[^\s])\s+<\//><\//;
print "$text";
Call it like this:
像这样调用它:
my.pl < input.xml > output.xml
回答by William
Well, you can do it in sed:
好吧,你可以在 sed 中做到:
x='TAG_TO_FORMAT'
sed -e '/<'"$x"'>/{:next;/<\/'"$x"'>/!{N;bnext;};s/\n//g;s/>\s*/>/;s/\S\s*</</;}'
When a line begins with the correct tag we go into a loop collecting lines until we find the closing tag. Then we erase all the newlines and clean up spaces anchored by > one one side, and < on the other.
当一行以正确的标签开始时,我们进入一个循环收集行,直到找到结束标签。然后我们擦除所有换行符并清理由 > 一侧和 < 另一侧锚定的空间。

