java 使用JAVA比较两个xml文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10311563/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 00:30:16  来源:igfitidea点击:

Comparing two xml files using JAVA

javaxml

提问by Sangram Anand

I have to xml files say abc.xml & 123.xml which are almost similar, i mean has the same content, but the second one i.e, 123.xml has more content than the earlier one. I want to read both the files using Java, and compare whether the content present in abc.xml for each tag is same as that in 123.xml, something like object comparison. Please suggest me how to read the xml file using java and start comparing.

我必须对 xml 文件说 abc.xml 和 123.xml,它们几乎相似,我的意思是具有相同的内容,但第二个即 123.xml 的内容比前一个多。我想使用Java读取这两个文件,并比较每个标签的abc.xml中存在的内容是否与123.xml中的内容相同,类似于对象比较。请建议我如何使用 java 读取 xml 文件并开始比较。

Thanks.

谢谢。

采纳答案by aviad

I would go for the XMLUnit. The features it provides :

我会选择XMLUnit。它提供的功能:

  • the differences between two pieces of XML
  • The outcome of transforming a piece of XML using XSLT
  • The evaluation of an XPath expression on a piece of XML
  • The validity of a piece of XML
  • Individual nodes in a piece of XML that are exposed by DOM Traversal
  • 两段 XML 的区别
  • 使用 XSLT 转换一段 XML 的结果
  • 对一段 XML 的 XPath 表达式求值
  • 一段 XML 的有效性
  • 由 DOM Traversal 公开的一段 XML 中的各个节点

Good Luck!

祝你好运!

回答by Zaz Gmy

if you just want to compare then use this:

如果你只是想比较然后使用这个:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setCoalescing(true);
dbf.setIgnoringElementContentWhitespace(true);
dbf.setIgnoringComments(true);
DocumentBuilder db = dbf.newDocumentBuilder();

Document doc1 = db.parse(new File("file1.xml"));
doc1.normalizeDocument();

Document doc2 = db.parse(new File("file2.xml"));

doc2.normalizeDocument();
Assert.assertTrue(doc1.isEqualNode(doc2));

else see this http://xmlunit.sourceforge.net/

否则看到这个 http://xmlunit.sourceforge.net/

回答by Kai

I would use JAXB to generate Java objects from the XML files and then compare the Java files. They would make the handling much easier.

我将使用 JAXB 从 XML 文件生成 Java 对象,然后比较 Java 文件。他们将使处理更容易。

回答by A_A

In general, if you know that you have two files with identical structure but slightly different and unordered content you are going to have to "read" the files to compare the contents.

一般来说,如果您知道您有两个结构相同但内容略有不同且无序的文件,您将不得不“阅读”这些文件以比较内容。

If you have the XML Schema for your XML files then you could use JAXBto create a set of classes that will represent the specific DOM that is defined by your XML schema. The benefit of this approach is that you will not have to parse the XML file through generic functions for elements and attributes but rather through the actual fields that make sense to your problem.

如果您有 XML 文件的 XML 模式,那么您可以使用JAXB创建一组类,这些类将表示由您的 XML 模式定义的特定 DOM。这种方法的好处是您不必通过元素和属性的通用函数解析 XML 文件,而是通过对您的问题有意义的实际字段。

Of course, to be able to detect the presence of the same entry across both files you are going to have to "match" them through some common field (for example, some ID).

当然,为了能够检测到两个文件中是否存在相同的条目,您必须通过一些公共字段(例如,某个 ID)“匹配”它们。

To help you with the duplicates discovery process you could use some relevant data structure from Java's collections, like the Set(or one of its derivatives)

为了帮助您完成重复项发现过程,您可以使用 Java 集合中的一些相关数据结构,例如Set(或其派生类之一)

I hope this helps.

我希望这有帮助。

回答by Michael Kay

The right approach depends on two factors:

正确的方法取决于两个因素:

(a) how much control do you want over how the comparison is done? For example, do you need to control whether whitespace is significant, whether comments should be ignored, whether namespace prefixes should be ignored, whether redundant namespace declarations should be ignored, whether the XML declaration should be ignored?

(a) 您希望对比较的方式进行多少控制?例如,是否需要控制空格是否重要、注释是否应忽略、名称空间前缀是否应忽略、冗余名称空间声明是否应忽略、XML 声明是否应忽略?

(b) what answer do you want? (i) a boolean: same/different, (ii) a list of differences suitable for a human to process, (iii) a list of differences suitable for an application to process.

(b) 你想要什么答案?(i) 布尔值:相同/不同,(ii) 适合人类处理的差异列表,(iii) 适合应用程序处理的差异列表。

The two techniques I use are: (a) convert both files to Canonical XML and then compare strings. This gives very little control and only gives a boolean result. (b) compare the two trees using the XPath 2.0 deep-equal() function or the extended Saxon version saxon:deep-equal(). The Saxon version gives more control over how the comparison is done, and a more detailed report of the differences found (for human reading, not for application use).

我使用的两种技术是:(a) 将两个文件都转换为 Canonical XML,然后比较字符串。这提供了很少的控制,只提供了一个布尔结果。(b) 使用 XPath 2.0 deep-equal() 函数或扩展的 Saxon 版本 saxon:deep-equal() 比较两棵树。Saxon 版本对如何进行比较提供了更多控制,并提供了更详细的差异报告(供人类阅读,而不是供应用程序使用)。

If you want to write Java code, you could of course implement your own comparison logic - for example you could find an open source implementation of XPath deep-equal, and modify it to meet your requirements. It's only a hundred or so lines of code.

如果您想编写 Java 代码,您当然可以实现自己的比较逻辑——例如,您可以找到 XPath deep-equal 的开源实现,并对其进行修改以满足您的要求。它只有一百多行代码。

回答by Dheeraj Joshi

Well if you just want to compare and display then you can use Guiffy

好吧,如果您只想比较和显示,那么您可以使用Guiffy

It is a good tool. If u want to do the processing in backend then you must use DOM parser load both files to 2 DOM objects and compare attribute by attribute.

这是一个很好的工具。如果您想在后端进行处理,那么您必须使用 DOM 解析器将两个文件加载到 2 个 DOM 对象并逐个比较属性。

回答by Nikolay Kasyanov

it's a bit overkill, but if your XML has schema, you can convert it into EMF metamodel & then use EMF Compare to compare.

这有点矫枉过正,但是如果您的 XML 具有架构,您可以将其转换为 EMF 元模型,然后使用 EMF 比较进行比较。