使用python的ElementTree合并xml文件

Question

提问by bioinf80

I need to merge two xml files on the third block of the xml. So, files A.xml and B.xml look like this:

我需要在 xml 的第三个块上合并两个 xml 文件。因此，文件 A.xml 和 B.xml 如下所示：

A.xml

xml文件

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
    </result>
  </results>
</sample>

B.xml

xml文件

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="Q">
      <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
    </result>
  </results>
</sample>

I need to merge on 'results'

我需要合并“结果”

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
   </result>
   <result type="Q">
      <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
   </result>
  </results>
</sample>

What I have done so far is this:

到目前为止我所做的是：

import os, os.path, sys
import glob
from xml.etree import ElementTree

def run(files):
    xml_files = glob.glob(files +"/*.xml")
    xml_element_tree = None
    for xml_file in xml_files:
        # get root
        data = ElementTree.parse(xml_file).getroot()
        # print ElementTree.tostring(data)
        for result in data.iter('result'):
            if xml_element_tree is None:
                xml_element_tree = data 
            else:
                xml_element_tree.extend(result) 
    if xml_element_tree is not None:
        print ElementTree.tostring(xml_element_tree)

As you can see, I assign the initial xml_element_tree to data which has the heading etc, and then extend with 'result'. However, this gives me this:

如您所见，我将初始 xml_element_tree 分配给具有标题等的数据，然后使用“结果”进行扩展。然而，这给了我这个：

<sample id="1">
<workflow value="x" version="1"/>
  <results>
   <result type="T">
      <result_data type="value" value="19"/>
      <result_data type="value" value="15"/>
      <result_data type="value" value="14"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
   </result>
  </results>
   <result_data type="value" value="11"/>
      <result_data type="value" value="21"/>
      <result_data type="value" value="13"/>
      <result_data type="value" value="12"/>
      <result_data type="value" value="15"/>
   </result>
</sample>

where the results need to be at the bottom. Any help will be appreciated.

结果需要在底部。任何帮助将不胜感激。

Answer 1

采纳答案by joojaa

Although this is mostly a duplicate and the answer can be found here, I already did this so i can share this python code:

虽然这主要是重复的，答案可以在这里找到，但我已经这样做了，所以我可以分享这个 python 代码：

import os, os.path, sys
import glob
from xml.etree import ElementTree

def run(files):
    xml_files = glob.glob(files +"/*.xml")
    xml_element_tree = None
    for xml_file in xml_files:
        data = ElementTree.parse(xml_file).getroot()
        # print ElementTree.tostring(data)
        for result in data.iter('results'):
            if xml_element_tree is None:
                xml_element_tree = data 
                insertion_point = xml_element_tree.findall("./results")[0]
            else:
                insertion_point.extend(result) 
    if xml_element_tree is not None:
        print ElementTree.tostring(xml_element_tree)

However this question contains another problem not present in the other post. The sample XML files are not valid XML so its not possible to have a XML tag with:

然而，这个问题包含另一个帖子中没有的问题。示例 XML 文件不是有效的 XML，因此不可能有带有以下内容的 XML 标记：

<sample="1">
    ...
</sample>

is not possible change to something like:

不可能更改为：

<sample id="1">
    ...
</sample>

Answer 2

回答by Jose78

You could try this solution:

你可以试试这个解决方案：

import glob
from xml.etree import ElementTree

def newRunRun(folder):
    xml_files = glob.glob(folder+"/*.xml")
    node = None
    for xmlFile in xml_files:      
        tree = ElementTree.parse(xmlFile)
        root = tree.getroot()
        if node is None:
            node = root
        else:
            elements = root.find("./results")           
            for element in elements._children:
                node[1].append(element)                
    print ElementTree.tostring(node)

folder = "resources"
newRunRun(folder)

As you can see, I′m using the first doc as a container, inserting inside it the elements of others docs... This is the ouput generated:

如您所见，我将第一个文档用作容器，在其中插入其他文档的元素……这是生成的输出：

<sample id="1">
<workflow value="x" version="1" />
  <results>
   <result type="Q">
      <result_data type="value" value="11" />
      <result_data type="value" value="21" />
      <result_data type="value" value="13" />
      <result_data type="value" value="12" />
      <result_data type="value" value="15" />
    </result>
  <result type="T">
      <result_data type="value" value="19" />
      <result_data type="value" value="15" />
      <result_data type="value" value="14" />
      <result_data type="value" value="13" />
      <result_data type="value" value="12" />
    </result>
  </results>
</sample>

Using the version: Python 2.7.15

使用版本：Python 2.7.15

使用python的ElementTree合并xml文件

提问by bioinf80

采纳答案by joojaa

回答by Jose78

相关推荐

最近更新

标签

使用python的ElementTree合并xml文件

提问by bioinf80

采纳答案by joojaa

回答by Jose78

相关推荐

如何在 Python 的 SciPy 中更改稀疏矩阵中的元素？

在 Sublime 文本中显示 Python 输出

Python subprocess.Popen 在不同的控制台

更改 python 请求中的引用 URL

相关推荐

最近更新

标签