在 Python 的单元测试中比较 XML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/321795/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 19:53:50  来源:igfitidea点击:

Comparing XML in a unit test in Python

pythonxmlelementtree

提问by Adam Endicott

I have an object that can build itself from an XML string, and write itself out to an XML string. I'd like to write a unit test to test round tripping through XML, but I'm having trouble comparing the two XML versions. Whitespace and attribute order seem to be the issues. Any suggestions for how to do this? This is in Python, and I'm using ElementTree (not that that really matters here since I'm just dealing with XML in strings at this level).

我有一个对象,它可以从一个 XML 字符串构建自己,并将自己写出一个 XML 字符串。我想编写一个单元测试来测试通过 XML 的往返,但是我在比较两个 XML 版本时遇到了麻烦。空格和属性顺序似乎是问题所在。有关如何执行此操作的任何建议?这是在 Python 中,我使用的是 ElementTree(这在这里并不重要,因为我只是在这个级别处理字符串中的 XML)。

采纳答案by Kozyarchuk

First normalize 2 XML, then you can compare them. I've used the following using lxml

首先规范化2个XML,然后你可以比较它们。我使用 lxml 使用了以下内容

obj1 = objectify.fromstring(expect)
expect = etree.tostring(obj1)
obj2 = objectify.fromstring(xml)
result = etree.tostring(obj2)
self.assertEquals(expect, result)

回答by Mikhail Korobov

This is an old question, but the accepted Kozyarchuk's answerdoesn't work for me because of attributes order, and the minidom solutiondoesn't work as-is either (no idea why, I haven't debugged it).

这是一个老问题,但由于属性顺序,公认的Kozyarchuk 的答案对我不起作用,并且minidom 解决方案也不能按原样工作(不知道为什么,我还没有调试它)。

This is what I finally came up with:

这就是我最终想出的:

from doctest import Example
from lxml.doctestcompare import LXMLOutputChecker

class XmlTest(TestCase):
    def assertXmlEqual(self, got, want):
        checker = LXMLOutputChecker()
        if not checker.check_output(want, got, 0):
            message = checker.output_difference(Example("", want), got, 0)
            raise AssertionError(message)

This also produces a diff that can be helpful in case of large xml files.

这也会产生一个差异,在大型 xml 文件的情况下可能会有所帮助。

回答by bobince

If the problem is really just the whitespace and attribute order, and you have no other constructs than text and elements to worry about, you can parse the strings using a standard XML parser and compare the nodes manually. Here's an example using minidom, but you could write the same in etree pretty simply:

如果问题真的只是空格和属性顺序,并且除了文本和元素之外您没有其他结构需要担心,您可以使用标准 XML 解析器解析字符串并手动比较节点。这是一个使用 minidom 的示例,但您可以非常简单地在 etree 中编写相同的代码:

def isEqualXML(a, b):
    da, db= minidom.parseString(a), minidom.parseString(b)
    return isEqualElement(da.documentElement, db.documentElement)

def isEqualElement(a, b):
    if a.tagName!=b.tagName:
        return False
    if sorted(a.attributes.items())!=sorted(b.attributes.items()):
        return False
    if len(a.childNodes)!=len(b.childNodes):
        return False
    for ac, bc in zip(a.childNodes, b.childNodes):
        if ac.nodeType!=bc.nodeType:
            return False
        if ac.nodeType==ac.TEXT_NODE and ac.data!=bc.data:
            return False
        if ac.nodeType==ac.ELEMENT_NODE and not isEqualElement(ac, bc):
            return False
    return True

If you need a more thorough equivalence comparison, covering the possibilities of other types of nodes including CDATA, PIs, entity references, comments, doctypes, namespaces and so on, you could use the DOM Level 3 Core method isEqualNode. Neither minidom nor etree have that, but pxdom is one implementation that supports it:

如果您需要更彻底的等价比较,涵盖其他类型节点的可能性,包括 CDATA、PI、实体引用、注释、文档类型、命名空间等,您可以使用 DOM Level 3 Core 方法 isEqualNode。minidom 和 etree 都没有,但 pxdom 是一种支持它的实现:

def isEqualXML(a, b):
    da, db= pxdom.parseString(a), pxdom.parseString(a)
    return da.isEqualNode(db)

(You may want to change some of the DOMConfiguration options on the parse if you need to specify whether entity references and CDATA sections match their replaced equivalents.)

(如果您需要指定实体引用和 CDATA 部分是否与其替换的等效项匹配,您可能希望更改解析中的一些 DOMConfiguration 选项。)

A slightly more roundabout way of doing it would be to parse, then re-serialise to canonical form and do a string comparison. Again pxdom supports the DOM Level 3 LS option ‘canonical-form' which you could use to do this; an alternative way using the stdlib's minidom implementation is to use c14n. However you must have the PyXML extensions install for this so you still can't quite do it within the stdlib:

一种稍微迂回的方法是解析,然后重新序列化为规范形式并进行字符串比较。pxdom 再次支持 DOM Level 3 LS 选项“canonical-form”,您可以使用它来执行此操作;使用 stdlib 的 minidom 实现的另一种方法是使用 c14n。但是,您必须为此安装 PyXML 扩展,因此您仍然无法在 stdlib 中完成它:

from xml.dom.ext import c14n

def isEqualXML(a, b):
    da, bd= minidom.parseString(a), minidom.parseString(b)
    a, b= c14n.Canonicalize(da), c14n.Canonicalize(db)
    return a==b

回答by andrewrk

Use xmldiff, a python tool that figures out the differences between two similar XML files, the same way that diff does it.

使用xmldiff,这是一个 Python 工具,可以找出两个相似的 XML 文件之间的差异,与 diff 的方法相同。

回答by Robert Rossney

Why are you examining the XML data at all?

为什么要检查 XML 数据?

The way to test object serialization is to create an instance of the object, serialize it, deserialize it into a new object, and compare the two objects. When you make a change that breaks serialization or deserialization, this test will fail.

测试对象序列化的方法是创建对象的一个​​实例,将其序列化,反序列化为一个新的对象,然后比较两个对象。当您做出破坏序列化或反序列化的更改时,此测试将失败。

The only thing checking the XML data is going to find for you is if your serializer is emitting a superset of what the deserializer requires, and the deserializer silently ignores stuff it doesn't expect.

检查 XML 数据的唯一方法是您的序列化器是否发出了反序列化器所需的超集,而反序列化器会默默地忽略它不期望的内容。

Of course, if something else is going to be consuming the serialized data, that's another matter. But in that case, you ought to be thinking about establishing a schema for the XML and validating it.

当然,如果其他东西会消耗序列化数据,那就是另一回事了。但在这种情况下,您应该考虑为 XML 建立模式并对其进行验证。

回答by pfctdayelise

I also had this problem and did some digging around it today. The doctestcompareapproachmay suffice, but I found via Ian Bickingthat it is based on formencode.doctest_xml_compare. Which appears to now be here. As you can see that is a pretty simple function, unlike doctestcompare(although I guess doctestcompareis collecting all the failures and maybe more sophisticated checking). Anyway copying/importing xml_compareout of formencodemay be a good solution.

我也有这个问题,今天做了一些挖掘。该doctestcompare方法可能就足够了,但我通过Ian Bicking发现它基于formencode.doctest_xml_compare. 现在似乎在这里。正如你所看到的,这是一个非常简单的函数,不像doctestcompare(虽然我猜doctestcompare是收集所有的失败和可能更复杂的检查)。反正复制/导入xml_compare的出formencode可能是一个很好的解决方案。

回答by moylop260

def xml_to_json(self, xml):
    """Receive 1 lxml etree object and return a json string"""
    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))
    return json.dumps(dict([recursive_dict(xml)]),
                      default=lambda x: str(x))

def assertEqualXML(self, xml_real, xml_expected):
    """Receive 2 objectify objects and show a diff assert if exists."""
    xml_expected_str = json.loads(self.xml_to_json(xml_expected))
    xml_real_str = json.loads(self.xml_to_json(xml_real))
    self.maxDiff = None
    self.assertEqual(xml_real_str, xml_expected_str)

You could see a output like as:

您可以看到如下输出:

                u'date': u'2016-11-22T19:55:02',
                u'item2': u'MX-INV0007',
         -      u'item3': u'Payments',
         ?                  ^^^
         +      u'item3': u'OAYments',
         ?                  ^^^ +

回答by porton

It can be easily done with minidom:

它可以很容易地完成minidom

class XmlTest(TestCase):
    def assertXmlEqual(self, got, want):
        return self.assertEqual(parseString(got).toxml(), parseString(want).toxml())

回答by Rob Williams

The Java component dbUnitdoes a lot of XML comparisons, so you might find it useful to look at their approach (especially to find any gotchas that they may have already addressed).

Java 组件进行dbUnit了大量 XML 比较,因此您可能会发现查看他们的方法很有用(尤其是找出他们可能已经解决的任何问题)。