使用python的ElementTree合并xml文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15921642/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
merging xml files using python's ElementTree
提问by bioinf80
I need to merge two xml files on the third block of the xml. So, files A.xml and B.xml look like this:
我需要在 xml 的第三个块上合并两个 xml 文件。因此,文件 A.xml 和 B.xml 如下所示:
A.xml
xml文件
<sample id="1">
<workflow value="x" version="1"/>
<results>
<result type="T">
<result_data type="value" value="19"/>
<result_data type="value" value="15"/>
<result_data type="value" value="14"/>
<result_data type="value" value="13"/>
<result_data type="value" value="12"/>
</result>
</results>
</sample>
B.xml
xml文件
<sample id="1">
<workflow value="x" version="1"/>
<results>
<result type="Q">
<result_data type="value" value="11"/>
<result_data type="value" value="21"/>
<result_data type="value" value="13"/>
<result_data type="value" value="12"/>
<result_data type="value" value="15"/>
</result>
</results>
</sample>
I need to merge on 'results'
我需要合并“结果”
<sample id="1">
<workflow value="x" version="1"/>
<results>
<result type="T">
<result_data type="value" value="19"/>
<result_data type="value" value="15"/>
<result_data type="value" value="14"/>
<result_data type="value" value="13"/>
<result_data type="value" value="12"/>
</result>
<result type="Q">
<result_data type="value" value="11"/>
<result_data type="value" value="21"/>
<result_data type="value" value="13"/>
<result_data type="value" value="12"/>
<result_data type="value" value="15"/>
</result>
</results>
</sample>
What I have done so far is this:
到目前为止我所做的是:
import os, os.path, sys
import glob
from xml.etree import ElementTree
def run(files):
xml_files = glob.glob(files +"/*.xml")
xml_element_tree = None
for xml_file in xml_files:
# get root
data = ElementTree.parse(xml_file).getroot()
# print ElementTree.tostring(data)
for result in data.iter('result'):
if xml_element_tree is None:
xml_element_tree = data
else:
xml_element_tree.extend(result)
if xml_element_tree is not None:
print ElementTree.tostring(xml_element_tree)
As you can see, I assign the initial xml_element_tree to data which has the heading etc, and then extend with 'result'. However, this gives me this:
如您所见,我将初始 xml_element_tree 分配给具有标题等的数据,然后使用“结果”进行扩展。然而,这给了我这个:
<sample id="1">
<workflow value="x" version="1"/>
<results>
<result type="T">
<result_data type="value" value="19"/>
<result_data type="value" value="15"/>
<result_data type="value" value="14"/>
<result_data type="value" value="13"/>
<result_data type="value" value="12"/>
</result>
</results>
<result_data type="value" value="11"/>
<result_data type="value" value="21"/>
<result_data type="value" value="13"/>
<result_data type="value" value="12"/>
<result_data type="value" value="15"/>
</result>
</sample>
where the results need to be at the bottom. Any help will be appreciated.
结果需要在底部。任何帮助将不胜感激。
采纳答案by joojaa
Although this is mostly a duplicate and the answer can be found here, I already did this so i can share this python code:
虽然这主要是重复的,答案可以在这里找到,但我已经这样做了,所以我可以分享这个 python 代码:
import os, os.path, sys
import glob
from xml.etree import ElementTree
def run(files):
xml_files = glob.glob(files +"/*.xml")
xml_element_tree = None
for xml_file in xml_files:
data = ElementTree.parse(xml_file).getroot()
# print ElementTree.tostring(data)
for result in data.iter('results'):
if xml_element_tree is None:
xml_element_tree = data
insertion_point = xml_element_tree.findall("./results")[0]
else:
insertion_point.extend(result)
if xml_element_tree is not None:
print ElementTree.tostring(xml_element_tree)
However this question contains another problem not present in the other post. The sample XML files are not valid XML so its not possible to have a XML tag with:
然而,这个问题包含另一个帖子中没有的问题。示例 XML 文件不是有效的 XML,因此不可能有带有以下内容的 XML 标记:
<sample="1">
...
</sample>
is not possible change to something like:
不可能更改为:
<sample id="1">
...
</sample>
回答by Jose78
You could try this solution:
你可以试试这个解决方案:
import glob
from xml.etree import ElementTree
def newRunRun(folder):
xml_files = glob.glob(folder+"/*.xml")
node = None
for xmlFile in xml_files:
tree = ElementTree.parse(xmlFile)
root = tree.getroot()
if node is None:
node = root
else:
elements = root.find("./results")
for element in elements._children:
node[1].append(element)
print ElementTree.tostring(node)
folder = "resources"
newRunRun(folder)
As you can see, I′m using the first doc as a container, inserting inside it the elements of others docs... This is the ouput generated:
如您所见,我将第一个文档用作容器,在其中插入其他文档的元素……这是生成的输出:
<sample id="1">
<workflow value="x" version="1" />
<results>
<result type="Q">
<result_data type="value" value="11" />
<result_data type="value" value="21" />
<result_data type="value" value="13" />
<result_data type="value" value="12" />
<result_data type="value" value="15" />
</result>
<result type="T">
<result_data type="value" value="19" />
<result_data type="value" value="15" />
<result_data type="value" value="14" />
<result_data type="value" value="13" />
<result_data type="value" value="12" />
</result>
</results>
</sample>
Using the version: Python 2.7.15
使用版本:Python 2.7.15

