使用 Python Elementree 访问 XMLNS 属性?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1953761/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 23:25:57  来源:igfitidea点击:

Accessing XMLNS attribute with Python Elementree?

pythonxmlelementtree

提问by Melchior

How can one access NS attributes through using ElementTree?

如何通过使用 ElementTree 访问 NS 属性?

With the following:

具有以下内容:

<data xmlns="http://www.foo.net/a" xmlns:a="http://www.foo.net/a" book="1" category="ABS" date="2009-12-22">

When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated..

当我尝试 root.get('xmlns') 时,我返回 None,Category 和 Date 很好,任何帮助表示赞赏..

回答by Jeffrey Harris

I think element.tagis what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example.

我想element.tag这就是你要找的。请注意,您的示例缺少尾部斜杠,因此它不平衡且无法解析。我在我的例子中添加了一个。

>>> from xml.etree import ElementTree as ET
>>> data = '''<data xmlns="http://www.foo.net/a"
...                 xmlns:a="http://www.foo.net/a"
...                 book="1" category="ABS" date="2009-12-22"/>'''
>>> element = ET.fromstring(data)
>>> element
<Element {http://www.foo.net/a}data at 1013b74d0>
>>> element.tag
'{http://www.foo.net/a}data'
>>> element.attrib
{'category': 'ABS', 'date': '2009-12-22', 'book': '1'}

If you just want to know the xmlns URI, you can split it out with a function like:

如果您只想知道 xmlns URI,您可以使用如下函数将其拆分:

def tag_uri_and_name(elem):
    if elem.tag[0] == "{":
        uri, ignore, tag = elem.tag[1:].partition("}")
    else:
        uri = None
        tag = elem.tag
    return uri, tag

For much more on namespaces and qualified names in ElementTree, see effbot's examples.

有关 ElementTree 中命名空间和限定名称的更多信息,请参阅effbot 的示例

回答by deancutlet

Look at the effbot namespaces documentation/examples; specifically the parse_mapfunction. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.

查看 effbot 命名空间文档/示例;特别是parse_map函数。它向您展示了如何向包含适用于该特定元素的前缀/URI 映射的每个元素添加 *ns_map* 属性。

However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.

但是,这会将 ns_map 属性添加到所有元素。根据我的需要,我发现我想要一个用于使元素查找更容易而不是硬编码的所有命名空间的全局映射。

Here's what I came up with:

这是我想出的:

import elementtree.ElementTree as ET

def parse_and_get_ns(file):
    events = "start", "start-ns"
    root = None
    ns = {}
    for event, elem in ET.iterparse(file, events):
        if event == "start-ns":
            if elem[0] in ns and ns[elem[0]] != elem[1]:
                # NOTE: It is perfectly valid to have the same prefix refer
                #     to different URI namespaces in different parts of the
                #     document. This exception serves as a reminder that this
                #     solution is not robust.    Use at your own peril.
                raise KeyError("Duplicate prefix with different URI found.")
            ns[elem[0]] = "{%s}" % elem[1]
        elif event == "start":
            if root is None:
                root = elem
    return ET.ElementTree(root), ns

With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):

有了这个,你可以解析一个 xml 文件并获得一个带有命名空间映射的字典。因此,如果您有一个如下所示的 xml 文件(“my.xml”):

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"\
>
<feed>
  <item>
    <title>Foo</title>
    <dc:creator>Joe McGroin</dc:creator>
    <description>etc...</description>
  </item>
</feed>
</rss>

You will be able to use the xml namepaces and get info for elements like dc:creator:

您将能够使用 xml 命名空间并获取dc:creator等元素的信息:

>>> tree, ns = parse_and_get_ns("my.xml")
>>> ns
{u'content': '{http://purl.org/rss/1.0/modules/content/}',
u'dc': '{http://purl.org/dc/elements/1.1/}'}
>>> item = tree.find("/feed/item")
>>> item.findtext(ns['dc']+"creator")
'Joe McGroin'

回答by Garcia Sylvain

Try this:

试试这个:

import xml.etree.ElementTree as ET
import re
import sys

with open(sys.argv[1]) as f:
    root = ET.fromstring(f.read())
    xmlns = ''
    m = re.search('{.*}', root.tag)
    if m:
        xmlns = m.group(0)
    print(root.find(xmlns + 'the_tag_you_want').text)