从命令行合并多个 XML 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9004135/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merge multiple XML files from command line
提问by TutanRamon
I have several xml files. They all have the same structure, but were splitted due to file size. So, let's say I have A.xml, B.xml, C.xmland D.xmland want to combine/merge them to combined.xml, using a command line tool.
我有几个 xml 文件。它们都具有相同的结构,但由于文件大小而被拆分。所以,让我们说我有A.xml,B.xml,C.xml并D.xml和要合并/它们合并到combined.xml,使用命令行工具。
A.xml
xml文件
<products>
<product id="1234"></product>
...
</products>
B.xml
xml文件
<products>
<product id="5678"></product>
...
</products>
etc.
等等。
采纳答案by berk
xml_grep
xml_grep
http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep
http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep
xml_grep--pretty_print indented --wrap products --descr '' --cond "product" *.xml > combined.xml
xml_grep--pretty_print 缩进 --wrap products --descr '' --cond "product" *.xml > combine.xml
- --wrap : encloses/wraps the the xml result with the given tag. (here:
products) - --cond : the xml subtree to grep (here:
product)
- --wrap : 用给定的标签包含/包装 xml 结果。(这里
products) - --cond:将XML子树到grep(这里
product)
回答by eswald
High-tech answer:
高科技答案:
Save this Python script as xmlcombine.py:
将此 Python 脚本保存为 xmlcombine.py:
#!/usr/bin/env python
import sys
from xml.etree import ElementTree
def run(files):
first = None
for filename in files:
data = ElementTree.parse(filename).getroot()
if first is None:
first = data
else:
first.extend(data)
if first is not None:
print ElementTree.tostring(first)
if __name__ == "__main__":
run(sys.argv[1:])
To combine files, run:
要合并文件,请运行:
python xmlcombine.py ?.xml > combined.xml
For further enhancement, consider using:
为了进一步增强,请考虑使用:
chmod +x xmlcombine.py: Allows you to omitpythonin the command linexmlcombine.py !(combined).xml > combined.xml: Collects all XML files except the output, but requires bash'sextgloboptionxmlcombine.py *.xml | sponge combined.xml: Collects everything incombined.xmlas well, but requires thespongeprogramimport lxml.etree as ElementTree: Uses a potentially faster XML parser
chmod +x xmlcombine.py: 允许你python在命令行中省略xmlcombine.py !(combined).xml > combined.xml: 收集除输出之外的所有 XML 文件,但需要 bash 的extglob选项xmlcombine.py *.xml | sponge combined.xml:也收集所有东西combined.xml,但需要sponge程序import lxml.etree as ElementTree:使用可能更快的 XML 解析器
回答by eswald
Low-tech simple answer:
低技术简单的答案:
echo '<products>' > combined.xml
grep -vh '</\?products>\|<?xml' *.xml >> combined.xml
echo '</products>' >> combined.xml
Limitations:
限制:
- The opening and closing tags need to be on their own line.
- The files need to all have the same outer tags.
- The outer tags must not have attributes.
- The files must not have inner tags that match the outer tags.
- Any current contents of
combined.xmlwill be wiped out instead of getting included.
- 开始和结束标签需要在自己的行上。
- 这些文件都需要具有相同的外部标签。
- 外部标签不能有属性。
- 文件不得具有与外部标签匹配的内部标签。
- 的任何当前内容都
combined.xml将被清除而不是被包含在内。
Each of these limitations can be worked around, but not all of them easily.
这些限制中的每一个都可以解决,但并非所有限制都容易解决。

