Python xml minidom。生成 <text>Some text</text> 元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/507405/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python xml minidom. generate <text>Some text</text> element
提问by Orjanp
I have the following code.
我有以下代码。
from xml.dom.minidom import Document
doc = Document()
root = doc.createElement('root')
doc.appendChild(root)
main = doc.createElement('Text')
root.appendChild(main)
text = doc.createTextNode('Some text here')
main.appendChild(text)
print doc.toprettyxml(indent='\t')
The result is:
结果是:
<?xml version="1.0" ?>
<root>
<Text>
Some text here
</Text>
</root>
This is all fine and dandy, but what if I want the output to look like this?
这一切都很好,但如果我希望输出看起来像这样怎么办?
<?xml version="1.0" ?>
<root>
<Text>Some text here</Text>
</root>
Can this easily be done?
这可以轻松完成吗?
Orjanp...
奥尔詹普...
采纳答案by bobince
Can this easily be done?
这可以轻松完成吗?
It depends what exact rule you want, but generally no, you get little control over pretty-printing. If you want a specific format you'll usually have to write your own walker.
这取决于您想要的确切规则,但通常不会,您几乎无法控制漂亮的打印。如果你想要一个特定的格式,你通常必须编写自己的 walker。
The DOM Level 3 LS parameter format-pretty-print in pxdomcomes pretty close to your example. Its rule is that if an element contains only a single TextNode, no extra whitespace will be added. However it (currently) uses two spaces for an indent rather than four.
pxdom 中的 DOM Level 3 LS 参数 format-pretty-print与您的示例非常接近。它的规则是,如果一个元素只包含一个 TextNode,则不会添加额外的空格。然而,它(目前)使用两个空格而不是四个空格作为缩进。
>>> doc= pxdom.parseString('<a><b>c</b></a>')
>>> doc.domConfig.setParameter('format-pretty-print', True)
>>> print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
<b>c</b>
</a>
(Adjust encoding and output format for whatever type of serialisation you're doing.)
(为您正在执行的任何类型的序列化调整编码和输出格式。)
If that's the rule you want, and you can get away with it, you might also be able to monkey-patch minidom's Element.writexml, eg.:
如果这是您想要的规则,并且您可以摆脱它,您也可以对 minidom 的 Element.writexml 进行猴子补丁,例如:
>>> from xml.dom import minidom
>>> def newwritexml(self, writer, indent= '', addindent= '', newl= ''):
... if len(self.childNodes)==1 and self.firstChild.nodeType==3:
... writer.write(indent)
... self.oldwritexml(writer) # cancel extra whitespace
... writer.write(newl)
... else:
... self.oldwritexml(writer, indent, addindent, newl)
...
>>> minidom.Element.oldwritexml= minidom.Element.writexml
>>> minidom.Element.writexml= newwritexml
All the usual caveats about the badness of monkey-patching apply.
关于猴子补丁的坏处的所有常见警告都适用。
回答by markmuetz
I was looking for exactly the same thing, and I came across this post. (the indenting provided in xml.dom.minidom broke another tool that I was using to manipulate the XML, and I needed it to be indented) I tried the accepted solution with a slightly more complex example and this was the result:
我正在寻找完全相同的东西,我偶然发现了这篇文章。(xml.dom.minidom 中提供的缩进破坏了我用来操作 XML 的另一个工具,我需要将它缩进)我用一个稍微复杂的例子尝试了公认的解决方案,结果如下:
In [1]: import pxdom
In [2]: xml = '<a><b>fda</b><c><b>fdsa</b></c></a>'
In [3]: doc = pxdom.parseString(xml)
In [4]: doc.domConfig.setParameter('format-pretty-print', True)
In [5]: print doc.pxdomContent
<?xml version="1.0" encoding="utf-16"?>
<a>
<b>fda</b><c>
<b>fdsa</b>
</c>
</a>
The pretty printed XML isn't formatted correctly, and I'm not too happy about monkey patching (i.e. I barely know what it means, and understand it's bad), so I looked for another solution.
漂亮的 XML 格式不正确,我对猴子补丁不太满意(即我几乎不知道它的含义,并且理解它很糟糕),所以我寻找了另一种解决方案。
I'm writing the output to file, so I can use the xmlindent program for Ubuntu ($sudo aptitude install xmlindent). So I just write the unformatted to the file, then call the xmlindent from within the python program:
我正在将输出写入文件,因此我可以使用 Ubuntu 的 xmlindent 程序($sudo aptitude install xmlindent)。所以我只是将未格式化的内容写入文件,然后从 python 程序中调用 xmlindent:
from subprocess import Popen, PIPE
Popen(["xmlindent", "-i", "2", "-w", "-f", "-nbe", file_name],
stderr=PIPE,
stdout=PIPE).communicate()
The -w switch causes the file to be overwritten, but annoyingly leaves a named e.g. "myfile.xml~" which you'll probably want to remove. The other switches are there to get the formatting right (for me).
-w 开关会导致文件被覆盖,但令人讨厌的是留下一个命名的,例如“myfile.xml~”,您可能想要删除它。其他开关用于正确格式化(对我而言)。
P.S. xmlindent is a stream formatter, i.e. you can use it as follows:
PS xmlindent 是一个流格式化程序,即您可以按如下方式使用它:
cat myfile.xml | xmlindent > myfile_indented.xml
So you might be able to use it in a python script without writing to a file if you needed to.
因此,如果需要,您可以在 python 脚本中使用它而无需写入文件。
回答by James Doepp - pihentagyu
This could be done with toxml(), using regular expressions to tidy things up.
这可以通过 toxml() 来完成,使用正则表达式来整理。
>>> from xml.dom.minidom import Document
>>> import re
>>> doc = Document()
>>> root = doc.createElement('root')
>>> _ = doc.appendChild(root)
>>> main = doc.createElement('Text')
>>> _ = root.appendChild(main)
>>> text = doc.createTextNode('Some text here')
>>> _ = main.appendChild(text)
>>> out = doc.toxml()
>>> niceOut = re.sub(r'><', r'>\n<', re.sub(r'(<\/.*?>)', r'\n', out))
>>> print niceOut
<?xml version="1.0" ?>
<root>
<Text>Some text here</Text>
</root>
回答by LeSuspect
Easiest way to do this is to use prettyxml, and remove the \n and \t inside the tags. That way you keep your indent as you required in your example.
最简单的方法是使用prettyxml,并删除标签内的\n 和\t。这样您就可以按照示例中的要求保持缩进。
xml_output = doc.toprettyxml()
nojunkintags = re.sub('>(\n|\t)</', '', xml_output)
print nojunkintags
xml_output = doc.toprettyxml()
nojunkintags = re.sub('>(\n|\t)</', '', xml_output)
print nojunkintags
回答by Stefanie Tellex
This solution worked for me without monkey patching or ceasing to use minidom:
这个解决方案对我有用,没有猴子补丁或停止使用 minidom:
from xml.dom.ext import PrettyPrint
from StringIO import StringIO
def toprettyxml_fixed (node, encoding='utf-8'):
tmpStream = StringIO()
PrettyPrint(node, stream=tmpStream, encoding=encoding)
return tmpStream.getvalue()
回答by Orjanp
The pyxml package offers a simple solution to this by using the xml.dom.ext.PrettyPrint() function. It can also print to a file descriptor.
pyxml 包通过使用 xml.dom.ext.PrettyPrint() 函数提供了一个简单的解决方案。它还可以打印到文件描述符。
But the pyxml package is no longer maintained.
但是 pyxml 包不再维护。
Oerjan Pettersen
欧扬·彼得森