python 使用 toprettyxml() 时出现换行问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1662351/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 22:49:35  来源:igfitidea点击:

Problem with newlines when I use toprettyxml()

pythonxml

提问by PierrOz

I'm currently using the toprettyxml()function of the xml.dommodule in a Python script and I'm having some trouble with the newlines. If don't use the newlparameter or if I use toprettyxml(newl='\n')it displays several newlines instead of only one.

我目前正在Python 脚本中使用模块的toprettyxml()功能,但xml.dom在换行时遇到了一些问题。如果不使用该newl参数或者如果我使用toprettyxml(newl='\n')它会显示多个换行符而不是一个。

For instance

例如

f = open(filename, 'w')
f.write(dom1.toprettyxml(encoding='UTF-8'))
f.close()

displayed:

显示:

<params>


    <param name="Level" value="#LEVEL#"/>


    <param name="Code" value="281"/>


</params>

Does anyone know where the problem comes from and how I can use it? FYI I'm using Python 2.6.1

有谁知道问题来自哪里以及我如何使用它?仅供参考,我正在使用 Python 2.6.1

采纳答案by xverges

toprettyxml()is quite awful. It is not a matter of Windows and '\r\n'. Trying any string as the newlparameter shows that too many lines are being added. Not only that, but other blanks (that may cause you problems when a machine reads the xml) are also added.

toprettyxml()很糟糕。这不是 Windows 和 '\r\n' 的问题。尝试任何字符串作为newl参数表明添加了太多行。不仅如此,还添加了其他空白(当机器读取 xml 时可能会导致您出现问题)。

Some workarounds available at
http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace

http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace 上提供了一些解决方法

回答by dganesh2002

I found another great solution :

我找到了另一个很好的解决方案:

f = open(filename, 'w')
dom_string = dom1.toprettyxml(encoding='UTF-8')
dom_string = os.linesep.join([s for s in dom_string.splitlines() if s.strip()])
f.write(dom_string)
f.close()

Above solution basically removes the unwanted newlines from the dom_string which are generated by toprettyxml().

上述解决方案基本上从​​由 toprettyxml() 生成的 dom_string 中删除了不需要的换行符。

Inputs taken from -> What's a quick one-liner to remove empty lines from a python string?

输入来自 ->从 python 字符串中删除空行的快速单行是什么?

回答by OndrejC

toprettyxml(newl='')works for me on Windows.

toprettyxml(newl='')在 Windows 上对我有用。

回答by Link64

This is a pretty old question but I guess I know what the problem is:

这是一个很老的问题,但我想我知道问题是什么:

Minidoms pretty print has a pretty straight forward method. It just adds the characters that you specified as arguments. That means, it will duplicate the characters if they already exist.

Minidoms 漂亮打印有一个非常直接的方法。它只是添加您指定为参数的字符。这意味着,如果字符已经存在,它将复制这些字符。

E.g. if you parse an XML file that looks like this:

例如,如果您解析一个如下所示的 XML 文件:

<parent>
   <child>
      Some text
   </child>
</parent>

there are already newline characters and indentions within the dom. Those are taken as text nodes by minidom and are still there when you parse it it into a dom object.

dom 中已经有换行符和缩进。这些被 minidom 视为文本节点,并且在您将其解析为 dom 对象时仍然存在。

If you now proceed to convert the dom object into an XML string, those text nodes will still be there. Meaning new line characters and indent tabs are still remaining. Using pretty print now, will just add morenew lines and moretabs. That's why in this case not using pretty print at all or specifying newl=''will result in the wanted output.

如果您现在继续将 dom 对象转换为 XML 字符串,则这些文本节点仍将存在。这意味着新行字符和缩进标签仍然存在。现在使用漂亮的打印,只会添加更多新行和更多标签。这就是为什么在这种情况下根本不使用漂亮打印或指定newl=''将导致想要的输出。

However, you generate the dom in your script, the text nodes will not be there, therefore pretty printing with newl='\r\n'and/or addindent='\t'will turn out quite pretty.

但是,您在脚本中生成 dom,文本节点将不存在,因此打印newl='\r\n'和/或addindent='\t'将变得非常漂亮。

TL;DR Indents and newlines remain from parsing and pretty print just adds more

TL;DR 缩进和换行符仍然无法解析,漂亮的打印只会增加更多

回答by felixhummel

If you don't mind installing new packages, try beautifulsoup. I had very good experiences with its xml prettyfier.

如果您不介意安装新软件包,请尝试 beautifulsoup。我对它的xml Prettyfier有很好的经验。

回答by Naveed Rasheed

Following function worked for my problem. I had to use python 2.7 and i was not allowed to install any 3rd party additional package.

以下功能适用于我的问题。我必须使用 python 2.7 并且不允许我安装任何 3rd 方附加包。

The crux of implementation is as follows:

实现的关键如下:

  1. Use dom.toprettyxml()
  2. Remove all white spaces
  3. Add new lines and tabs as per your requirement.
  1. 使用 dom.toprettyxml()
  2. 删除所有空格
  3. 根据您的要求添加新行和选项卡。

~

~

import os
import re
import xml.dom.minidom
import sys

class XmlTag:
    opening = 0
    closing = 1
    self_closing = 2
    closing_tag = "</"
    self_closing_tag = "/>"
    opening_tag = "<"

def to_pretty_xml(xml_file_path):
    pretty_xml = ""
    space_or_tab_count = "  " # Add spaces or use \t
    tab_count = 0
    last_tag = -1

    dom = xml.dom.minidom.parse(xml_file_path)

    # get pretty-printed version of input file
    string_xml = dom.toprettyxml(' ', os.linesep)

    # remove version tag
    string_xml = string_xml.replace("<?xml version=\"1.0\" ?>", '')

    # remove empty lines and spaces
    string_xml = "".join(string_xml.split())

    # move each tag to new line
    string_xml = string_xml.replace('>', '>\n')

    for line in string_xml.split('\n'):
        if line.__contains__(XmlTag.closing_tag):

            # For consecutive closing tags decrease the indentation
            if last_tag == XmlTag.closing:
                tab_count = tab_count - 1

            # Move closing element to next line
            if last_tag == XmlTag.closing or last_tag == XmlTag.self_closing:
                pretty_xml = pretty_xml + '\n' + (space_or_tab_count * tab_count)

            pretty_xml = pretty_xml + line
            last_tag = XmlTag.closing

        elif line.__contains__(XmlTag.self_closing_tag):

            # Print self closing on next line with one indentation from parent node
            pretty_xml = pretty_xml + '\n' + (space_or_tab_count * (tab_count+1)) + line
            last_tag = XmlTag.self_closing

        elif line.__contains__(XmlTag.opening_tag):

            # For consecutive opening tags increase the indentation
            if last_tag == XmlTag.opening:
                tab_count = tab_count + 1

            # Move opening element to next line
            if last_tag == XmlTag.opening or last_tag == XmlTag.closing:
                pretty_xml = pretty_xml + '\n' + (space_or_tab_count * tab_count)

            pretty_xml = pretty_xml + line
            last_tag = XmlTag.opening

    return pretty_xml

pretty_xml = to_pretty_xml("simple.xml")

with open("pretty.xml", 'w') as f:
    f.write(pretty_xml)

回答by Will McCutchen

Are you viewing the resulting file on Windows? If so, try using toprettyxml(newl='\r\n').

您是在 Windows 上查看生成的文件吗?如果是这样,请尝试使用toprettyxml(newl='\r\n').