在 Java 中比较 2 个 XML 文档的最佳方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/141993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 08:50:56  来源:igfitidea点击:

Best way to compare 2 XML documents in Java

javaxmltestingparsingcomparison

提问by Mike Deck

I'm trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end. I've got a good set of input/output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end.

我正在尝试编写一个应用程序的自动化测试,该测试基本上将自定义消息格式转换为 XML 消息并将其发送到另一端。我有一组很好的输入/输出消息对,所以我需要做的就是将输入消息发送进来并监听来自另一端的 XML 消息。

When it comes time to compare the actual output to the expected output I'm running into some problems. My first thought was just to do string comparisons on the expected and actual messages. This doens't work very well because the example data we have isn't always formatted consistently and there are often times different aliases used for the XML namespace (and sometimes namespaces aren't used at all.)

当需要将实际输出与预期输出进行比较时,我遇到了一些问题。我的第一个想法是对预期和实际消息进行字符串比较。这不太好用,因为我们拥有的示例数据的格式并不总是一致,而且 XML 名称空间经常使用不同的别名(有时根本不使用名称空间。)

I know I can parse both strings and then walk through each element and compare them myself and this wouldn't be too difficult to do, but I get the feeling there's a better way or a library I could leverage.

我知道我可以解析两个字符串,然后遍历每个元素并自己比较它们,这不会太难,但我觉得有更好的方法或我可以利用的库。

So, boiled down, the question is:

所以,归结起来,问题是:

Given two Java Strings which both contain valid XML how would you go about determining if they are semantically equivalent? Bonus points if you have a way to determine what the differences are.

给定两个都包含有效 XML 的 Java 字符串,您将如何确定它们在语义上是否等效?如果您有办法确定差异是什么,则可以加分。

采纳答案by Tom

Sounds like a job for XMLUnit

听起来像是 XMLUnit 的工作

Example:

例子:

public class SomeTest extends XMLTestCase {
  @Test
  public void test() {
    String xml1 = ...
    String xml2 = ...

    XMLUnit.setIgnoreWhitespace(true); // ignore whitespace differences

    // can also compare xml Documents, InputSources, Readers, Diffs
    assertXMLEqual(xml1, xml2);  // assertXMLEquals comes from XMLTestCase
  }
}

回答by skaffman

Xomhas a Canonicalizer utility which turns your DOMs into a regular form, which you can then stringify and compare. So regardless of whitespace irregularities or attribute ordering, you can get regular, predictable comparisons of your documents.

Xom有一个 Canonicalizer 实用程序,可以将您的 DOM 转换为常规形式,然后您可以对其进行字符串化和比较。因此,无论空格不规则或属性排序如何,您都可以获得文档的定期、可预测的比较。

This works especially well in IDEs that have dedicated visual String comparators, like Eclipse. You get a visual representation of the semantic differences between the documents.

这在具有专用可视字符串比较器的 IDE 中特别有效,例如 Eclipse。您可以直观地表示文档之间的语义差异。

回答by anjanb

skaffman seems to be giving a good answer.

skaffman 似乎给出了一个很好的答案。

another way is probably to format the XML using a commmand line utility like xmlstarlet(http://xmlstar.sourceforge.net/) and then format both the strings and then use any diff utility(library) to diff the resulting output files. I don't know if this is a good solution when issues are with namespaces.

另一种方法可能是使用像 xmlstarlet( http://xmlstar.sourceforge.net/)这样的命令行实用程序格式化 XML ,然后格式化两个字符串,然后使用任何差异实用程序(库)来比较结果输出文件。当名称空间出现问题时,我不知道这是否是一个好的解决方案。

回答by Steve B.

Since you say "semantically equivalent" I assume you mean that you want to do more than just literally verify that the xml outputs are (string) equals, and that you'd want something like

既然你说“语义等价”,我假设你的意思是你想做的不仅仅是从字面上验证 xml 输出是(字符串)等于,而且你想要类似的东西

<foo> some stuff here</foo></code>

<foo> 这里有些东西</foo></code>

and

<foo>some stuff here</foo></code>

<foo>这里有些东西</foo></code>

do read as equivalent. Ultimately it's going to matter how you're defining "semantically equivalent" on whatever object you're reconstituting the message from. Simply build that object from the messages and use a custom equals() to define what you're looking for.

做等效读。最终,您如何在重构消息的任何对象上定义“语义等效”将很重要。只需从消息构建该对象并使用自定义 equals() 来定义您要查找的内容。

回答by Pimin Konstantin Kefaloukos

I'm using Altova DiffDogwhich has options to compare XML files structurally (ignoring string data).

我正在使用Altova DiffDog,它具有从结构上比较 XML 文件的选项(忽略字符串数据)。

This means that (if checking the 'ignore text' option):

这意味着(如果选中“忽略文本”选项):

<foo a="xxx" b="xxx">xxx</foo>

and

<foo b="yyy" a="yyy">yyy</foo> 

are equal in the sense that they have structural equality. This is handy if you have example files that differ in data, but not structure!

在结构平等的意义上是平等的。如果您有数据不同但结构不同的示例文件,这很方便!

回答by Archimedes Trajano

The following will check if the documents are equal using standard JDK libraries.

下面将使用标准 JDK 库检查文档是否相等。

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();

Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();

Document doc2 = db.parse(new File("file2.xml"));
doc2.normalizeDocument();

Assert.assertTrue(doc1.isEqualNode(doc2));

normalize() is there to make sure there are no cycles (there technically wouldn't be any)

normalize() 是为了确保没有循环(技术上不会有任何循环)

The above code will require the white spaces to be the same within the elements though, because it preserves and evaluates it. The standard XML parser that comes with Java does not allow you to set a feature to provide a canonical version or understand xml:spaceif that is going to be a problem then you may need a replacement XML parser such as xerces or use JDOM.

上面的代码将要求元素内的空格相同,因为它保留并评估它。Java 附带的标准 XML 解析器不允许您设置功能以提供规范版本或了解xml:space这是否会成为问题,那么您可能需要替换 XML 解析器,例如 xerces 或使用 JDOM。

回答by Javelin

Thanks, I extended this, try this ...

谢谢,我扩展了这个,试试这个......

import java.io.ByteArrayInputStream;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;

public class XmlDiff 
{
    private boolean nodeTypeDiff = true;
    private boolean nodeValueDiff = true;

    public boolean diff( String xml1, String xml2, List<String> diffs ) throws Exception
    {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        dbf.setCoalescing(true);
        dbf.setIgnoringElementContentWhitespace(true);
        dbf.setIgnoringComments(true);
        DocumentBuilder db = dbf.newDocumentBuilder();


        Document doc1 = db.parse(new ByteArrayInputStream(xml1.getBytes()));
        Document doc2 = db.parse(new ByteArrayInputStream(xml2.getBytes()));

        doc1.normalizeDocument();
        doc2.normalizeDocument();

        return diff( doc1, doc2, diffs );

    }

    /**
     * Diff 2 nodes and put the diffs in the list 
     */
    public boolean diff( Node node1, Node node2, List<String> diffs ) throws Exception
    {
        if( diffNodeExists( node1, node2, diffs ) )
        {
            return true;
        }

        if( nodeTypeDiff )
        {
            diffNodeType(node1, node2, diffs );
        }

        if( nodeValueDiff )
        {
            diffNodeValue(node1, node2, diffs );
        }


        System.out.println(node1.getNodeName() + "/" + node2.getNodeName());

        diffAttributes( node1, node2, diffs );
        diffNodes( node1, node2, diffs );

        return diffs.size() > 0;
    }

    /**
     * Diff the nodes
     */
    public boolean diffNodes( Node node1, Node node2, List<String> diffs ) throws Exception
    {
        //Sort by Name
        Map<String,Node> children1 = new LinkedHashMap<String,Node>();      
        for( Node child1 = node1.getFirstChild(); child1 != null; child1 = child1.getNextSibling() )
        {
            children1.put( child1.getNodeName(), child1 );
        }

        //Sort by Name
        Map<String,Node> children2 = new LinkedHashMap<String,Node>();      
        for( Node child2 = node2.getFirstChild(); child2!= null; child2 = child2.getNextSibling() )
        {
            children2.put( child2.getNodeName(), child2 );
        }

        //Diff all the children1
        for( Node child1 : children1.values() )
        {
            Node child2 = children2.remove( child1.getNodeName() );
            diff( child1, child2, diffs );
        }

        //Diff all the children2 left over
        for( Node child2 : children2.values() )
        {
            Node child1 = children1.get( child2.getNodeName() );
            diff( child1, child2, diffs );
        }

        return diffs.size() > 0;
    }


    /**
     * Diff the nodes
     */
    public boolean diffAttributes( Node node1, Node node2, List<String> diffs ) throws Exception
    {        
        //Sort by Name
        NamedNodeMap nodeMap1 = node1.getAttributes();
        Map<String,Node> attributes1 = new LinkedHashMap<String,Node>();        
        for( int index = 0; nodeMap1 != null && index < nodeMap1.getLength(); index++ )
        {
            attributes1.put( nodeMap1.item(index).getNodeName(), nodeMap1.item(index) );
        }

        //Sort by Name
        NamedNodeMap nodeMap2 = node2.getAttributes();
        Map<String,Node> attributes2 = new LinkedHashMap<String,Node>();        
        for( int index = 0; nodeMap2 != null && index < nodeMap2.getLength(); index++ )
        {
            attributes2.put( nodeMap2.item(index).getNodeName(), nodeMap2.item(index) );

        }

        //Diff all the attributes1
        for( Node attribute1 : attributes1.values() )
        {
            Node attribute2 = attributes2.remove( attribute1.getNodeName() );
            diff( attribute1, attribute2, diffs );
        }

        //Diff all the attributes2 left over
        for( Node attribute2 : attributes2.values() )
        {
            Node attribute1 = attributes1.get( attribute2.getNodeName() );
            diff( attribute1, attribute2, diffs );
        }

        return diffs.size() > 0;
    }
    /**
     * Check that the nodes exist
     */
    public boolean diffNodeExists( Node node1, Node node2, List<String> diffs ) throws Exception
    {
        if( node1 == null && node2 == null )
        {
            diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2 + "\n" );
            return true;
        }

        if( node1 == null && node2 != null )
        {
            diffs.add( getPath(node2) + ":node " + node1 + "!=" + node2.getNodeName() );
            return true;
        }

        if( node1 != null && node2 == null )
        {
            diffs.add( getPath(node1) + ":node " + node1.getNodeName() + "!=" + node2 );
            return true;
        }

        return false;
    }

    /**
     * Diff the Node Type
     */
    public boolean diffNodeType( Node node1, Node node2, List<String> diffs ) throws Exception
    {       
        if( node1.getNodeType() != node2.getNodeType() ) 
        {
            diffs.add( getPath(node1) + ":type " + node1.getNodeType() + "!=" + node2.getNodeType() );
            return true;
        }

        return false;
    }

    /**
     * Diff the Node Value
     */
    public boolean diffNodeValue( Node node1, Node node2, List<String> diffs ) throws Exception
    {       
        if( node1.getNodeValue() == null && node2.getNodeValue() == null )
        {
            return false;
        }

        if( node1.getNodeValue() == null && node2.getNodeValue() != null )
        {
            diffs.add( getPath(node1) + ":type " + node1 + "!=" + node2.getNodeValue() );
            return true;
        }

        if( node1.getNodeValue() != null && node2.getNodeValue() == null )
        {
            diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2 );
            return true;
        }

        if( !node1.getNodeValue().equals( node2.getNodeValue() ) )
        {
            diffs.add( getPath(node1) + ":type " + node1.getNodeValue() + "!=" + node2.getNodeValue() );
            return true;
        }

        return false;
    }


    /**
     * Get the node path
     */
    public String getPath( Node node )
    {
        StringBuilder path = new StringBuilder();

        do
        {           
            path.insert(0, node.getNodeName() );
            path.insert( 0, "/" );
        }
        while( ( node = node.getParentNode() ) != null );

        return path.toString();
    }
}

回答by sreehari

Using JExamXML with java application

在 Java 应用程序中使用 JExamXML

    import com.a7soft.examxml.ExamXML;
    import com.a7soft.examxml.Options;

       .................

       // Reads two XML files into two strings
       String s1 = readFile("orders1.xml");
       String s2 = readFile("orders.xml");

       // Loads options saved in a property file
       Options.loadOptions("options");

       // Compares two Strings representing XML entities
       System.out.println( ExamXML.compareXMLString( s1, s2 ) );

回答by acdcjunior

The latest version of XMLUnitcan help the job of asserting two XML are equal. Also XMLUnit.setIgnoreWhitespace()and XMLUnit.setIgnoreAttributeOrder()may be necessary to the case in question.

最新版本的XMLUnit可以帮助断言两个 XML 相等。也XMLUnit.setIgnoreWhitespace()XMLUnit.setIgnoreAttributeOrder()可能有必要在问题的情况下。

See working code of a simple example of XML Unit use below.

请参阅下面使用 XML 单元的简单示例的工作代码。

import org.custommonkey.xmlunit.DetailedDiff;
import org.custommonkey.xmlunit.XMLUnit;
import org.junit.Assert;

public class TestXml {

    public static void main(String[] args) throws Exception {
        String result = "<abc             attr=\"value1\"                title=\"something\">            </abc>";
        // will be ok
        assertXMLEquals("<abc attr=\"value1\" title=\"something\"></abc>", result);
    }

    public static void assertXMLEquals(String expectedXML, String actualXML) throws Exception {
        XMLUnit.setIgnoreWhitespace(true);
        XMLUnit.setIgnoreAttributeOrder(true);

        DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(expectedXML, actualXML));

        List<?> allDifferences = diff.getAllDifferences();
        Assert.assertEquals("Differences found: "+ diff.toString(), 0, allDifferences.size());
    }

}

If using Maven, add this to your pom.xml:

如果使用 Maven,请将其添加到您的pom.xml

<dependency>
    <groupId>xmlunit</groupId>
    <artifactId>xmlunit</artifactId>
    <version>1.4</version>
</dependency>

回答by Wojtek

This will compare full string XMLs (reformatting them on the way). It makes it easy to work with your IDE (IntelliJ, Eclipse), cos you just click and visually see the difference in the XML files.

这将比较完整的字符串 XML(在途中重新格式化它们)。它使您可以轻松地使用您的 IDE(IntelliJ、Eclipse),因为您只需单击即可直观地查看 XML 文件中的差异。

import org.apache.xml.security.c14n.CanonicalizationException;
import org.apache.xml.security.c14n.Canonicalizer;
import org.apache.xml.security.c14n.InvalidCanonicalizerException;
import org.w3c.dom.Element;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.TransformerException;
import java.io.IOException;
import java.io.StringReader;

import static org.apache.xml.security.Init.init;
import static org.junit.Assert.assertEquals;

public class XmlUtils {
    static {
        init();
    }

    public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
        Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
        byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
        return new String(canonXmlBytes);
    }

    public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
        InputSource src = new InputSource(new StringReader(input));
        Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
        Boolean keepDeclaration = input.startsWith("<?xml");
        DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
        LSSerializer writer = impl.createLSSerializer();
        writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
        writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
        return writer.writeToString(document);
    }

    public static void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
        String canonicalExpected = prettyFormat(toCanonicalXml(expected));
        String canonicalActual = prettyFormat(toCanonicalXml(actual));
        assertEquals(canonicalExpected, canonicalActual);
    }
}

I prefer this to XmlUnit because the client code (test code) is cleaner.

我更喜欢这个而不是 XmlUnit,因为客户端代码(测试代码)更干净。