Python：xml ElementTree（或 lxml）中的命名空间

Question

提问by Hellnar

I want to retrieve a legacy xml file, manipulate and save it.

我想检索一个旧的 xml 文件，操作并保存它。

Here is my code:

这是我的代码：

from xml.etree import cElementTree as ET
NS = "{http://www.somedomain.com/XI/Traffic/10}"

def fix_xml(filename):
    f = ET.parse(filename)
    root = f.getroot()
    eventlist = root.findall("%(ns)Event" % {'ns':NS })
    xpath = "%(ns)sEventDetail/%(ns)sEventDescription" % {'ns':NS }
    for event in eventlist:
        desc = event.find(xpath)
        desc.text = desc.text.upper() # do some editting to the text.

    ET.ElementTree(root, nsmap=NS).write("out.xml", encoding="utf-8")


shorten_xml("test.xml")

The file I load contains:

我加载的文件包含：

xmlns="http://www.somedomain.com/XI/Traffic/10"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somedomain.com/XI/Traffic/10 10.xds"

at the root tag.

在根标记处。

I have the following problems, related to namespace:

我有以下与命名空间相关的问题：

As you see, for each tag call, I have give the namespace at the begining to retreive a child.
Generated xml file doesn't have <?xml version="1.0" encoding="utf-8"?>at the begining.
The tags at the output contains such <ns0:eventDescription>while I need output as the original <eventDescription>, without namespace at the begining.

如您所见，对于每个标记调用，我在开始检索子项时都给出了命名空间。
生成的 xml 文件<?xml version="1.0" encoding="utf-8"?>一开始没有。
输出中的标签包含这样的内容，<ns0:eventDescription>而我需要将输出作为原始<eventDescription>，在开头没有命名空间。

How can these be solved?

如何解决这些问题？

Answer 1

采纳答案by John Machin

Have a look at the lxml tutorial section on namespaces. Also this article about namespaces in ElementTree.

查看有关命名空间的lxml 教程部分。还有这篇关于 ElementTree 中命名空间的文章。

Problem 1: Put up with it, like everybody else does. Instead of "%(ns)Event" % {'ns':NS }try NS+"Event".

问题 1：忍受它，就像其他人一样。而不是"%(ns)Event" % {'ns':NS }尝试NS+"Event"。

Problem 2: By default, the XML declaration is written only if it is required. You can force it (lxml only) by using xml_declaration=Truein your write()call.

问题 2：默认情况下，仅在需要时才编写 XML 声明。您可以通过xml_declaration=True在您的write()通话中使用来强制它（仅限 lxml）。

Problem 3: The nsmaparg appears to be lxml-only. AFAICT it needs a MAPping, not a string. Try nsmap={None: NS}. The effbot article has a section describing a workaround for this.

问题 3：nsmaparg 似乎是 lxml-only。AFAICT 它需要一个映射，而不是一个字符串。试试nsmap={None: NS}。effbot 文章有一个部分描述了此问题的解决方法。

Answer 2

回答by Steven

To answer your questions in order:

要按顺序回答您的问题：

you can't just ignore the namespace, not in the path syntax that .findall()uses , but not in "real" xpath (supported by lxml) either: there you'd still be forced to use a prefix, and still need to provide some prefix-to-uri mapping.
use xml_declaration=Trueas well as encoding='utf-8'with the .write()call (available in lxml, but in stdlib xml.etree only since python 2.7 I believe)
I believe lxml will do behave like you want

您不能只是忽略命名空间，而不是在使用的路径语法中.findall()，也不能在“真实”xpath（由 lxml 支持）中：在那里您仍然会被迫使用前缀，并且仍然需要提供一些前缀到 uri 映射。
使用xml_declaration=True以及encoding='utf-8'与.write()呼叫（在LXML可用，但在STDLIB xml.etree只是因为Python 2.7，我相信）
我相信 lxml 会像你想要的那样表现

Python：xml ElementTree（或 lxml）中的命名空间

提问by Hellnar

采纳答案by John Machin

回答by Steven

相关推荐

最近更新

标签

Python：xml ElementTree（或 lxml）中的命名空间

提问by Hellnar

采纳答案by John Machin

回答by Steven

相关推荐

在 Python 2 中，如何写入父作用域中的变量？

Javascript 相当于 Python 的 zip 函数

Python 交换numpy数组中的列？

Python 如何列出导入的模块？

相关推荐

最近更新

标签