Python:xml ElementTree(或 lxml)中的命名空间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4886189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: namespaces in xml ElementTree (or lxml)
提问by Hellnar
I want to retrieve a legacy xml file, manipulate and save it.
我想检索一个旧的 xml 文件,操作并保存它。
Here is my code:
这是我的代码:
from xml.etree import cElementTree as ET
NS = "{http://www.somedomain.com/XI/Traffic/10}"
def fix_xml(filename):
f = ET.parse(filename)
root = f.getroot()
eventlist = root.findall("%(ns)Event" % {'ns':NS })
xpath = "%(ns)sEventDetail/%(ns)sEventDescription" % {'ns':NS }
for event in eventlist:
desc = event.find(xpath)
desc.text = desc.text.upper() # do some editting to the text.
ET.ElementTree(root, nsmap=NS).write("out.xml", encoding="utf-8")
shorten_xml("test.xml")
The file I load contains:
我加载的文件包含:
xmlns="http://www.somedomain.com/XI/Traffic/10"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somedomain.com/XI/Traffic/10 10.xds"
at the root tag.
在根标记处。
I have the following problems, related to namespace:
我有以下与命名空间相关的问题:
- As you see, for each tag call, I have give the namespace at the begining to retreive a child.
- Generated xml file doesn't have
<?xml version="1.0" encoding="utf-8"?>at the begining. - The tags at the output contains such
<ns0:eventDescription>while I need output as the original<eventDescription>, without namespace at the begining.
- 如您所见,对于每个标记调用,我在开始检索子项时都给出了命名空间。
- 生成的 xml 文件
<?xml version="1.0" encoding="utf-8"?>一开始没有。 - 输出中的标签包含这样的内容,
<ns0:eventDescription>而我需要将输出作为原始<eventDescription>,在开头没有命名空间。
How can these be solved?
如何解决这些问题?
采纳答案by John Machin
Have a look at the lxml tutorial section on namespaces. Also this article about namespaces in ElementTree.
查看有关命名空间的lxml 教程部分。还有这篇关于 ElementTree 中命名空间的文章。
Problem 1: Put up with it, like everybody else does. Instead of "%(ns)Event" % {'ns':NS }try NS+"Event".
问题 1:忍受它,就像其他人一样。而不是"%(ns)Event" % {'ns':NS }尝试NS+"Event"。
Problem 2: By default, the XML declaration is written only if it is required. You can force it (lxml only) by using xml_declaration=Truein your write()call.
问题 2:默认情况下,仅在需要时才编写 XML 声明。您可以通过xml_declaration=True在您的write()通话中使用来强制它(仅限 lxml)。
Problem 3: The nsmaparg appears to be lxml-only. AFAICT it needs a MAPping, not a string. Try nsmap={None: NS}. The effbot article has a section describing a workaround for this.
问题 3:nsmaparg 似乎是 lxml-only。AFAICT 它需要一个映射,而不是一个字符串。试试nsmap={None: NS}。effbot 文章有一个部分描述了此问题的解决方法。
回答by Steven
To answer your questions in order:
要按顺序回答您的问题:
you can't just ignore the namespace, not in the path syntax that
.findall()uses , but not in "real" xpath (supported by lxml) either: there you'd still be forced to use a prefix, and still need to provide some prefix-to-uri mapping.use
xml_declaration=Trueas well asencoding='utf-8'with the.write()call (available in lxml, but in stdlib xml.etree only since python 2.7 I believe)I believe lxml will do behave like you want
您不能只是忽略命名空间,而不是在使用的路径语法中
.findall(),也不能在“真实”xpath(由 lxml 支持)中:在那里您仍然会被迫使用前缀,并且仍然需要提供一些前缀到 uri 映射。使用
xml_declaration=True以及encoding='utf-8'与.write()呼叫(在LXML可用,但在STDLIB xml.etree只是因为Python 2.7,我相信)我相信 lxml 会像你想要的那样表现

