Python 2.7 版:XML ElementTree:如何遍历子元素的某些元素以找到匹配项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15643094/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python version 2.7: XML ElementTree: How to iterate through certain elements of a child element in order to find a match
提问by Sarah-Ann
I'm a programming novice and only rarely use python so please bear with me as I try to explain what I am trying to do :)
我是一个编程新手,很少使用 python,所以请耐心等待我尝试解释我想要做什么:)
I have the following XML:
我有以下 XML:
<?xml version = "1.0" encoding = "utf-8"?>
<Patients>
<Patient>
<PatientCharacteristics>
<patientCode>3</patientCode>
</PatientCharacteristics>
<Visits>
<Visit>
<DAS>
<CRP>14</CRP>
<ESR/>
<Joints>
<DAS_PROFILE>28/28</DAS_PROFILE>
<SWOL28>20</SWOL28>
<TEN28>20</TEN28>
</Joints>
</DAS>
<VisitDate>2010-02-17</VisitDate>
</Visit>
<Visit>
<DAS>
<CRP>10</CRP>
<ESR/>
<Joints>
<DAS_PROFILE>28/28</DAS_PROFILE>
<SWOL28>15</SWOL28>
<TEN28>20</TEN28>
</Joints>
</DAS>
<VisitDate>2010-02-10</VisitDate>
</Visit>
</Visits>
</Patient>
<Patient>
<PatientCharacteristics>
<patientCode>3</patientCode>
</PatientCharacteristics>
<Visits>
<Visit>
<DAS>
<CRP>14</CRP>
<ESR/>
<Joints>
<DAS_PROFILE>28/28</DAS_PROFILE>
<SWOL28>34</SWOL28>
<TEN28>0</TEN28>
</Joints>
</DAS>
<VisitDate>2010-08-17</VisitDate>
</Visit>
<Visit>
<DAS>
<CRP>10</CRP>
<ESR/>
<Joints>
<DAS_PROFILE>28/28</DAS_PROFILE>
<SWOL28></SWOL28>
<TEN28>2</TEN28>
</Joints>
</DAS>
<VisitDate>2010-07-10</VisitDate>
</Visit>
<Visit>
<DAS>
<CRP>9</CRP>
<ESR/>
<Joints>
<DAS_PROFILE>28/28</DAS_PROFILE>
<SWOL28>56</SWOL28>
<TEN28>6</TEN28>
</Joints>
</DAS>
<VisitDate>2009-07-10</VisitDate>
</Visit>
</Visits>
</Patient>
</Patients>
All I want to do here is update certain 'SWOL28' values if they match the patientCode and VisitDate that I have stored in a text file. As I understand, elementtree does not include a parent reference, as if it did, I could just use findall() from the root and work backwards from there. As it stands here is my psuedocode:
我在这里要做的就是更新某些“SWOL28”值,如果它们与我存储在文本文件中的患者代码和访问日期相匹配。据我了解,elementtree 不包含父引用,就像它包含的那样,我可以从根使用 findall() 并从那里向后工作。在这里是我的伪代码:
- For each line in the text file:
- Put Visit_Date Patient_Code New_SWOL28 into variables
- For each patient element:
- If patientCode = Patient_Code
- For each Visit element:
- If VisitDate = Visit_Date
- If SWOL28 element exists for this visit
- Update SWOL28 to New_SWOL28
- 对于文本文件中的每一行:
- 将 Visit_Date Patient_Code New_SWOL28 放入变量
- 对于每个患者元素:
- 如果患者代码 = Patient_Code
- 对于每个访问元素:
- 如果 VisitDate = Visit_Date
- 如果这次访问存在 SWOL28 元素
- 将 SWOL28 更新为 New_SWOL28
But I am stuck at step number 5. How do I get a list of visits to iterated through? Apologies if this is a very dumb question but I have searched high and low for an answer I assure you! I have stripped down my code to the bare example of the part I need to fix below:
但是我被困在第 5 步。如何获得要迭代的访问列表?如果这是一个非常愚蠢的问题,我深表歉意,但我向你保证,我已经四处寻找答案!我已经将我的代码简化为我需要在下面修复的部分的裸例:
import xml.etree.ElementTree as ET
tree = ET.parse('DB3.xml')
root = tree.getroot()
for child in root: # THIS GETS ME ALL THE PATIENT ATTRIBUTES
print child.tag
for x in child/Visit: # THIS IS WHAT I CANNOT FIND THE CORRECT SYNTAX FOR
# I WOULD THEN PERFORM STEPS 6, 7 AND 8 HERE
I would be deeply appreciative of any ideas any of you may have on this. I am not a programming natural that's for sure!
我将非常感谢你们中的任何人对此的任何想法。我不是一个自然的编程,这是肯定的!
Thanks in advance, Sarah
提前致谢,莎拉
Edit 1:
编辑1:
On the advice of SVK below I tried the following:
根据下面 SVK 的建议,我尝试了以下操作:
import xml.etree.ElementTree as ET
tree = ET.parse('Untitled.xml')
root = tree.getroot()
for child in root:
print child.tag
child.find( "visits" )
for x in child.iter("visit"):
print x.tag, x.text
But the only output I get is: Patient Patient and none of the lower tags. Any ideas?
但我得到的唯一输出是:Patient Patient 并且没有任何较低的标签。有任何想法吗?
采纳答案by Peter Enns
This is untested by it should be fairly close to what you want.
这是未经测试的,它应该与您想要的非常接近。
for patient in root:
patient_code = patient.find('PatientCharacteristics').find('patientCode')
if patient_code.text == code:
for visit in patient.find('Visits'):
visit_date = visit.find('VisitDate')
if visit_date.text == date:
swol28 = visit.find('DAS').find('Joints').find('SWOL28')
if swol28.text:
visit.find('DAS').find('Joints').set('SWOL28', new_swol28)
回答by svk
You can iterate over all the "visit" tags directly under an element "element" like this:
您可以直接在元素“元素”下迭代所有“访问”标签,如下所示:
for x in element.iter("visit"):
You can find the first direct child of element matching a certain tag with:
您可以找到与某个标签匹配的元素的第一个直接子元素:
element.find( "visits" )
It looks like you will first have to locate the "visits" element, which is the parent of "visit", and then iterate through its "visit" children. Putting those together you'd have something like this:
看起来您首先必须找到“visits”元素,它是“visit”的父元素,然后遍历它的“visit”子元素。把它们放在一起,你会得到这样的东西:
for patient_element in root:
print patient_element.tag
visits_element = patient_element.find( "visits" )
for visit_element in visits_element.iter("visit"):
print visit_element.tag, visit_element.text
# ... further processing of each visit element here
In general look at the section "Finding interesting elements" in the documentation for xml.etree.ElementTree: http://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements
通常查看 xml.etree.ElementTree 文档中的“查找有趣的元素”部分:http://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements
回答by niroyb
You could use a CssSelector to get the nodes you want from the Patient element:
您可以使用 CssSelector 从 Patient 元素中获取您想要的节点:
from lxml.cssselect import CSSSelector
visitSelector = CSSSelector('Visit')
visits = visitSelector(child)
you can do the same to get the patientCode Tag and the SWOL28 tag
then you can access and modifiy the text of the elements using element.text
您可以执行相同的操作来获取患者代码标签和 SWOL28 标签,然后您可以使用访问和修改元素的文本 element.text
回答by MattH
If you use lxml.etree, you can use xpathto find the elements you need to update.
如果使用lxml.etree,则可以使用xpath来查找需要更新的元素。
E.g.
例如
doc.xpath('Patient[PatientCharacteristics/patientCode=$patient]/Visits/Visit[VisitDate=$visit]',patient="3",visit="2009-07-10")
So
所以
from lxml import etree
doc = etree.parse("DB3.xml")
changes = [
dict(patient='3',visit='2010-08-17',swol28="99"),
]
def update_doc(x,d):
for row in d:
for visit in x.xpath('Patient[PatientCharacteristics/patientCode=$patient]/Visits/Visit[VisitDate=$visit]',**row):
for swol28 in visit.xpath('DAS/Joints/SWOL28'):
swol28.text = row['swol28']
update_doc(doc,changes)
print etree.tostring(doc)
Should yield you something that contains:
应该为您提供包含以下内容的内容:
<Patient>
<PatientCharacteristics>
<patientCode>3</patientCode>
</PatientCharacteristics>
<Visits>
<Visit>
<DAS>
<CRP>14</CRP>
<ESR/>
<Joints>
<DAS_PROFILE>28/28</DAS_PROFILE>
<SWOL28>99</SWOL28>
<TEN28>0</TEN28>
</Joints>
</DAS>
<VisitDate>2010-08-17</VisitDate>
</Visit>
</Visits>
</Patient>

