Python 如何使用 xmltodict 从 xml 文件中获取项目

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40154727/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:12:53  来源:igfitidea点击:

How to use xmltodict to get items out of an xml file

pythonxmlxmltodict

提问by Sam Vruggink

I am trying to easily access values from an xml file.

我正在尝试轻松访问 xml 文件中的值。

<artikelen>
    <artikel nummer="121">
        <code>ABC123</code>
        <naam>Highlight pen</naam>
        <voorraad>231</voorraad>
        <prijs>0.56</prijs>
    </artikel>
    <artikel nummer="123">
        <code>PQR678</code>
        <naam>Nietmachine</naam>
        <voorraad>587</voorraad>
        <prijs>9.99</prijs>
    </artikel>
..... etc

If i want to acces the value ABC123, how do I get it?

如果我想访问值 ABC123,我如何获得它?

import xmltodict

with open('8_1.html') as fd:
    doc = xmltodict.parse(fd.read())
    print(doc[fd]['code'])

回答by Paul

Using your example:

使用您的示例:

import xmltodict

with open('artikelen.xml') as fd:
    doc = xmltodict.parse(fd.read())

If you examine doc, you'll see it's an OrderedDict, ordered by tag:

如果您检查doc,您会看到它是一个OrderedDict,按标签排序:

>>> doc
OrderedDict([('artikelen',
              OrderedDict([('artikel',
                            [OrderedDict([('@nummer', '121'),
                                          ('code', 'ABC123'),
                                          ('naam', 'Highlight pen'),
                                          ('voorraad', '231'),
                                          ('prijs', '0.56')]),
                             OrderedDict([('@nummer', '123'),
                                          ('code', 'PQR678'),
                                          ('naam', 'Nietmachine'),
                                          ('voorraad', '587'),
                                          ('prijs', '9.99')])])]))])

The root node is called artikelen, and there a subnode artikelwhich is a list of OrderedDictobjects, so if you want the codefor every article, you would do:

根节点被称为artikelen,并且有一个子节点,artikel它是一个OrderedDict对象列表,所以如果你想要code每篇文章,你可以这样做:

codes = []
for artikel in doc['artikelen']['artikel']:
    codes.append(artikel['code'])

# >>> codes
# ['ABC123', 'PQR678']

If you specifically want the codeonly when nummeris 121, you could do this:

如果你特别想要codeonly when nummeris 121,你可以这样做:

code = None
for artikel in doc['artikelen']['artikel']:
    if artikel['@nummer'] == '121':
        code = artikel['code']
        break

That said, if you're parsing XML documents and want to search for a specific value like that, I would consider using XPath expressions, which are supported by ElementTree.

也就是说,如果您正在解析 XML 文档并想要搜索这样的特定值,我会考虑使用XPath 表达式,它由ElementTree.

回答by Chaitanya Sama

This is using xml.etree You can try this:

这是使用 xml.etree 你可以试试这个:

for artikelobj in root.findall('artikel'):
    print artikelobj.find('code')

if you want to extract a specific code based on the attribute 'nummer' of artikel, then you can try this:

如果你想根据artikel的'nummer'属性提取特定的代码,那么你可以试试这个:

for artikelobj in root.findall('artikel'):
    if artikel.get('nummer') == 121:
        print artikelobj.find('code')

this will print only the code you want.

这将只打印您想要的代码。

回答by Chr

To read .xml files :

读取 .xml 文件:

import lxml.etree as ET
root = ET.parse(filename).getroot()
value = root.node1.node2.variable_name.text

回答by pseudo

You can use lxml package using XPath Expression.

您可以使用 XPath 表达式使用 lxml 包。

from lxml import etree
f = open("8_1.html", "r")
tree = etree.parse(f)
expression = "/artikelen/artikel[1]/code"
l = tree.xpath(expression)
code = next(i.text for i in l)
print code

# ABC123

The thing to notice here is the expression. /artikelenis the root element. /artikel[1]chooses the first artikelelement under root(Notice first element is not at index 0). /codeis the child element under artikel[1]. You can read more about at lxmland xpath syntax.

这里要注意的是表达式。/artikelen是根元素。/artikel[1]选择下的第一个artikel元素root(注意第一个元素不在索引 0 处)。/code是 下的子元素artikel[1]。您可以阅读有关lxmlxpath 语法的更多信息。