Python 访问使用 ElementTree 解析的 xml 文件中的嵌套子项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43921237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:31:33  来源:igfitidea点击:

Access nested children in xml file parsed with ElementTree

pythonxmltreexml-parsingelementtree

提问by FaCoffee

I am new to xml parsing. This xml filehas the following tree:

我是 xml 解析的新手。此 xml 文件具有以下树:

FHRSEstablishment
 |--> Header
 |    |--> ...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...

but when I access it with ElementTree and look for the childtags and attributes,

但是当我使用 ElementTree 访问它并查找child标签和属性时,

import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
   file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
   print child.tag, child.attrib

I only get:

我只得到:

Header {}
EstablishmentCollection {}

which I assume means that their attributes are empty. Why is it so, and how can I access the children nested inside EstablishmentDetailand Scores?

我认为这意味着它们的属性是空的。为什么会这样,我如何访问嵌套在EstablishmentDetailand 中的孩子Scores

EDIT

编辑

Thanks to the answers below I can get inside the tree, but if I want to retrieve values such as those in Scores, this fails:

多亏了下面的答案,我可以进入树内部,但是如果我想检索诸如 中的值Scores,这将失败:

for node in root.find('.//EstablishmentDetail/Scores'):
    rating = node.attrib.get('Hygiene')
    print rating 

and produces

并产生

None
None
None

Why is that?

这是为什么?

回答by Keerthana Prabhakaran

Yo have to iter() over your root.

你必须在你的根上迭代()。

that is root.iter()would do the trick!

这就是root.iter()诀窍!

import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
   print child.tag, child.attrib

Output:

输出:

FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
  • To get all tags inside EstablishmentDetailyou need to find that tag and then loop through its children!
  • 要获取所有标签,EstablishmentDetail您需要找到该标签,然后循环遍历它的子标签!

That is, for example.

也就是说,例如。

for child in root.find('.//EstablishmentDetail'):
    print child.tag, child.attrib

Output:

输出:

FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
  • To get the score for Hygieneas you've mentioned in comment,
  • 要获得Hygiene您在评论中提到的分数,

What you have done is, it will get the first Scorestag and that will have Hygiene, ConfidenceInManagement, Structural tags as child when you call for each in root.find('.//Scores'):rating=child.get('Hygiene'). That is, obviously all three child will not have the element!

你所做的是,它会得到第一个Scores标签,当你调用for each in root.find('.//Scores'):rating=child.get('Hygiene'). 也就是说,显然所有三个孩子都不会拥有元素!

You need to first - find all Scorestag. - find Hygienein every tags found!

您需要首先 - 找到所有Scores标签。-Hygiene在找到的每个标签中找到!

for each in root.findall('.//Scores'):
    rating = each.find('.//Hygiene')
    print '' if rating is None else rating.text

Output:

输出:

5
5
5
0
5

回答by Andrea

Hope it could be useful:

希望它可能有用:

import xml.etree.ElementTree as etree
with open('filename.xml') as tmpfile:
    doc = etree.iterparse(tmpfile, events=("start", "end"))
    doc = iter(doc)
    event, root = doc.next()
    num = 0
    for event, elem in doc:
        print event, elem