Python 2.7 版：XML ElementTree：如何遍历子元素的某些元素以找到匹配项

Question

提问by Sarah-Ann

I'm a programming novice and only rarely use python so please bear with me as I try to explain what I am trying to do :)

我是一个编程新手，很少使用 python，所以请耐心等待我尝试解释我想要做什么:)

I have the following XML:

我有以下 XML：

<?xml version = "1.0" encoding = "utf-8"?>
<Patients>
    <Patient>
               <PatientCharacteristics>
                   <patientCode>3</patientCode>
               </PatientCharacteristics>
               <Visits>
                   <Visit>
                          <DAS>
                               <CRP>14</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>20</SWOL28>
                                       <TEN28>20</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-02-17</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>10</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>15</SWOL28>
                                       <TEN28>20</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-02-10</VisitDate>
                   </Visit>
               </Visits>
    </Patient>
    <Patient>
        <PatientCharacteristics>
                   <patientCode>3</patientCode>
        </PatientCharacteristics>
               <Visits>
                   <Visit>
                          <DAS>
                               <CRP>14</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>34</SWOL28>
                                       <TEN28>0</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-08-17</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>10</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28></SWOL28>
                                       <TEN28>2</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-07-10</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>9</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>56</SWOL28>
                                       <TEN28>6</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2009-07-10</VisitDate>
                   </Visit>
               </Visits>

    </Patient>
</Patients>

All I want to do here is update certain 'SWOL28' values if they match the patientCode and VisitDate that I have stored in a text file. As I understand, elementtree does not include a parent reference, as if it did, I could just use findall() from the root and work backwards from there. As it stands here is my psuedocode:

我在这里要做的就是更新某些“SWOL28”值，如果它们与我存储在文本文件中的患者代码和访问日期相匹配。据我了解，elementtree 不包含父引用，就像它包含的那样，我可以从根使用 findall() 并从那里向后工作。在这里是我的伪代码：

For each line in the text file:
Put Visit_Date Patient_Code New_SWOL28 into variables
For each patient element:
If patientCode = Patient_Code
For each Visit element:
If VisitDate = Visit_Date
If SWOL28 element exists for this visit
Update SWOL28 to New_SWOL28

对于文本文件中的每一行：
将 Visit_Date Patient_Code New_SWOL28 放入变量
对于每个患者元素：
如果患者代码 = Patient_Code
对于每个访问元素：
如果 VisitDate = Visit_Date
如果这次访问存在 SWOL28 元素
将 SWOL28 更新为 New_SWOL28

But I am stuck at step number 5. How do I get a list of visits to iterated through? Apologies if this is a very dumb question but I have searched high and low for an answer I assure you! I have stripped down my code to the bare example of the part I need to fix below:

但是我被困在第 5 步。如何获得要迭代的访问列表？如果这是一个非常愚蠢的问题，我深表歉意，但我向你保证，我已经四处寻找答案！我已经将我的代码简化为我需要在下面修复的部分的裸例：

import xml.etree.ElementTree as ET
tree = ET.parse('DB3.xml')
root = tree.getroot()
for child in root: # THIS GETS ME ALL THE PATIENT ATTRIBUTES
    print child.tag 
    for x in child/Visit: # THIS IS WHAT I CANNOT FIND THE CORRECT SYNTAX FOR
        # I WOULD THEN PERFORM STEPS 6, 7 AND 8 HERE

I would be deeply appreciative of any ideas any of you may have on this. I am not a programming natural that's for sure!

我将非常感谢你们中的任何人对此的任何想法。我不是一个自然的编程，这是肯定的！

Thanks in advance, Sarah

提前致谢，莎拉

Edit 1:

编辑1：

On the advice of SVK below I tried the following:

根据下面 SVK 的建议，我尝试了以下操作：

import xml.etree.ElementTree as ET
tree = ET.parse('Untitled.xml')
root = tree.getroot()
for child in root:
    print child.tag 
    child.find( "visits" )
    for x in child.iter("visit"):
        print x.tag, x.text

But the only output I get is: Patient Patient and none of the lower tags. Any ideas?

但我得到的唯一输出是：Patient Patient 并且没有任何较低的标签。有任何想法吗？

Answer 1

采纳答案by Peter Enns

This is untested by it should be fairly close to what you want.

这是未经测试的，它应该与您想要的非常接近。

for patient in root:
    patient_code =  patient.find('PatientCharacteristics').find('patientCode')
    if patient_code.text == code:
            for visit in patient.find('Visits'):
                    visit_date = visit.find('VisitDate')
                    if visit_date.text == date:
                        swol28 = visit.find('DAS').find('Joints').find('SWOL28')
                        if swol28.text:
                            visit.find('DAS').find('Joints').set('SWOL28', new_swol28)

Answer 2

回答by svk

You can iterate over all the "visit" tags directly under an element "element" like this:

您可以直接在元素“元素”下迭代所有“访问”标签，如下所示：

for x in element.iter("visit"):

You can find the first direct child of element matching a certain tag with:

您可以找到与某个标签匹配的元素的第一个直接子元素：

element.find( "visits" )

It looks like you will first have to locate the "visits" element, which is the parent of "visit", and then iterate through its "visit" children. Putting those together you'd have something like this:

看起来您首先必须找到“visits”元素，它是“visit”的父元素，然后遍历它的“visit”子元素。把它们放在一起，你会得到这样的东西：

for patient_element in root:
    print patient_element.tag 
    visits_element = patient_element.find( "visits" )
    for visit_element in visits_element.iter("visit"):
        print visit_element.tag, visit_element.text
        # ... further processing of each visit element here

In general look at the section "Finding interesting elements" in the documentation for xml.etree.ElementTree: http://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements

通常查看 xml.etree.ElementTree 文档中的“查找有趣的元素”部分：http://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements

Answer 3

回答by niroyb

You could use a CssSelector to get the nodes you want from the Patient element:

您可以使用 CssSelector 从 Patient 元素中获取您想要的节点：

from lxml.cssselect import CSSSelector
visitSelector = CSSSelector('Visit')
visits =  visitSelector(child)

you can do the same to get the patientCode Tag and the SWOL28 tag then you can access and modifiy the text of the elements using element.text

您可以执行相同的操作来获取患者代码标签和 SWOL28 标签，然后您可以使用访问和修改元素的文本 element.text

Answer 4

回答by MattH

If you use lxml.etree, you can use xpathto find the elements you need to update.

如果使用lxml.etree，则可以使用xpath来查找需要更新的元素。

E.g.

例如

doc.xpath('Patient[PatientCharacteristics/patientCode=$patient]/Visits/Visit[VisitDate=$visit]',patient="3",visit="2009-07-10")

So

所以

from lxml import etree

doc = etree.parse("DB3.xml")

changes = [
  dict(patient='3',visit='2010-08-17',swol28="99"),
]

def update_doc(x,d):
  for row in d:
    for visit in x.xpath('Patient[PatientCharacteristics/patientCode=$patient]/Visits/Visit[VisitDate=$visit]',**row):
      for swol28 in visit.xpath('DAS/Joints/SWOL28'):
        swol28.text = row['swol28']

update_doc(doc,changes)

print etree.tostring(doc)

Should yield you something that contains:

应该为您提供包含以下内容的内容：

<Patient>
  <PatientCharacteristics>
    <patientCode>3</patientCode>
  </PatientCharacteristics>
  <Visits>
    <Visit>
      <DAS>
      <CRP>14</CRP>
      <ESR/>
      <Joints>
        <DAS_PROFILE>28/28</DAS_PROFILE>
        <SWOL28>99</SWOL28>
        <TEN28>0</TEN28>
      </Joints>
    </DAS>
    <VisitDate>2010-08-17</VisitDate>
    </Visit>
  </Visits>
</Patient>

Python 2.7 版：XML ElementTree：如何遍历子元素的某些元素以找到匹配项

提问by Sarah-Ann

采纳答案by Peter Enns

回答by svk

回答by niroyb

回答by MattH

相关推荐

最近更新

标签

Python 2.7 版：XML ElementTree：如何遍历子元素的某些元素以找到匹配项

提问by Sarah-Ann

采纳答案by Peter Enns

回答by svk

回答by niroyb

回答by MattH

相关推荐

Python 将图像 (png) 转换为矩阵，然后转换为一维数组

Python 请求库重定向新 url

Python 如何按键对字典进行排序？

Python if else 在 try 和 except 中

相关推荐

最近更新

标签