Python:xml ElementTree(或 lxml)中的命名空间

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4886189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 17:58:37  来源:igfitidea点击:

Python: namespaces in xml ElementTree (or lxml)

pythonxmlnamespaceselementtree

提问by Hellnar

I want to retrieve a legacy xml file, manipulate and save it.

我想检索一个旧的 xml 文件,操作并保存它。

Here is my code:

这是我的代码:

from xml.etree import cElementTree as ET
NS = "{http://www.somedomain.com/XI/Traffic/10}"

def fix_xml(filename):
    f = ET.parse(filename)
    root = f.getroot()
    eventlist = root.findall("%(ns)Event" % {'ns':NS })
    xpath = "%(ns)sEventDetail/%(ns)sEventDescription" % {'ns':NS }
    for event in eventlist:
        desc = event.find(xpath)
        desc.text = desc.text.upper() # do some editting to the text.

    ET.ElementTree(root, nsmap=NS).write("out.xml", encoding="utf-8")


shorten_xml("test.xml")

The file I load contains:

我加载的文件包含:

xmlns="http://www.somedomain.com/XI/Traffic/10"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somedomain.com/XI/Traffic/10 10.xds"

at the root tag.

在根标记处。

I have the following problems, related to namespace:

我有以下与命名空间相关的问题:

  • As you see, for each tag call, I have give the namespace at the begining to retreive a child.
  • Generated xml file doesn't have <?xml version="1.0" encoding="utf-8"?>at the begining.
  • The tags at the output contains such <ns0:eventDescription>while I need output as the original <eventDescription>, without namespace at the begining.
  • 如您所见,对于每个标记调用,我在开始检索子项时都给出了命名空间。
  • 生成的 xml 文件<?xml version="1.0" encoding="utf-8"?>一开始没有。
  • 输出中的标签包含这样的内容,<ns0:eventDescription>而我需要将输出作为原始<eventDescription>,在开头没有命名空间。

How can these be solved?

如何解决这些问题?

采纳答案by John Machin

Have a look at the lxml tutorial section on namespaces. Also this article about namespaces in ElementTree.

查看有关命名空间lxml 教程部分。还有这篇关于 ElementTree 中命名空间的文章

Problem 1: Put up with it, like everybody else does. Instead of "%(ns)Event" % {'ns':NS }try NS+"Event".

问题 1:忍受它,就像其他人一样。而不是"%(ns)Event" % {'ns':NS }尝试NS+"Event"

Problem 2: By default, the XML declaration is written only if it is required. You can force it (lxml only) by using xml_declaration=Truein your write()call.

问题 2:默认情况下,仅在需要时才编写 XML 声明。您可以通过xml_declaration=True在您的write()通话中使用来强制它(仅限 lxml)。

Problem 3: The nsmaparg appears to be lxml-only. AFAICT it needs a MAPping, not a string. Try nsmap={None: NS}. The effbot article has a section describing a workaround for this.

问题 3:nsmaparg 似乎是 lxml-only。AFAICT 它需要一个映射,而不是一个字符串。试试nsmap={None: NS}。effbot 文章有一个部分描述了此问题的解决方法。

回答by Steven

To answer your questions in order:

要按顺序回答您的问题:

  • you can't just ignore the namespace, not in the path syntax that .findall()uses , but not in "real" xpath (supported by lxml) either: there you'd still be forced to use a prefix, and still need to provide some prefix-to-uri mapping.

  • use xml_declaration=Trueas well as encoding='utf-8'with the .write()call (available in lxml, but in stdlib xml.etree only since python 2.7 I believe)

  • I believe lxml will do behave like you want

  • 您不能只是忽略命名空间,而不是在使用的路径语法中.findall(),也不能在“真实”xpath(由 lxml 支持)中:在那里您仍然会被迫使用前缀,并且仍然需要提供一些前缀到 uri 映射。

  • 使用xml_declaration=True以及encoding='utf-8'.write()呼叫(在LXML可用,但在STDLIB xml.etree只是因为Python 2.7,我相信)

  • 我相信 lxml 会像你想要的那样表现