在 Java 中比较两个 Excel 文件的最简单方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/866346/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Easiest way to compare two Excel files in Java?
提问by Andrew Swan
I'm writing a JUnit test for some code that produces an Excel file (which is binary). I have another Excel file that contains my expected output. What's the easiest way to compare the actual file to the expected file?
我正在为一些生成 Excel 文件(二进制文件)的代码编写 JUnit 测试。我有另一个 Excel 文件,其中包含我的预期输出。将实际文件与预期文件进行比较的最简单方法是什么?
Sure I could write the code myself, but I was wondering if there's an existing method in a trusted third-party library (e.g. Spring or Apache Commons) that already does this.
我当然可以自己编写代码,但我想知道在受信任的第三方库(例如 Spring 或 Apache Commons)中是否存在已经这样做的现有方法。
采纳答案by Andrew Swan
Here's what I ended up doing (with the heavy lifting being done by DBUnit):
这是我最终做的事情(繁重的工作由DBUnit完成):
/**
* Compares the data in the two Excel files represented by the given input
* streams, closing them on completion
*
* @param expected can't be <code>null</code>
* @param actual can't be <code>null</code>
* @throws Exception
*/
private void compareExcelFiles(InputStream expected, InputStream actual)
throws Exception
{
try {
Assertion.assertEquals(new XlsDataSet(expected), new XlsDataSet(actual));
}
finally {
IOUtils.closeQuietly(expected);
IOUtils.closeQuietly(actual);
}
}
This compares the data in the two files, with no risk of false negatives from any irrelevant metadata that might be different. Hope this helps someone.
这会比较两个文件中的数据,没有任何可能不同的不相关元数据出现漏报的风险。希望这可以帮助某人。
回答by CookieOfFortune
Maybe... compare MD5 digests of each file? I'm sure there are a lot of ways to do it. You could just open both files and compare each byte.
也许...比较每个文件的 MD5 摘要?我相信有很多方法可以做到。您可以打开两个文件并比较每个字节。
EDIT: James stated how the XLS format might have differences in the metadata. Perhaps you should use the same interface you used to generate the xls files to open them and compare the values from cell to cell?
编辑:James 说明了 XLS 格式如何在元数据中存在差异。也许您应该使用用于生成 xls 文件的相同界面来打开它们并比较单元格之间的值?
回答by Jon
You could use javaxdelta to check whether the two files are the same. It's available from here:
您可以使用 javaxdelta 来检查两个文件是否相同。它可以从这里获得:
回答by Andrew Swan
Just found out there's something in commons-io's FileUtils. Thanks for the other answers.
刚刚发现在 commons-io 的FileUtils 中有一些东西。感谢其他答案。
回答by sleske
A simple file comparison can easily be done using some checksumming (like MD5) or just reading both files.
使用一些校验和(如 MD5)或仅读取两个文件可以轻松完成简单的文件比较。
However, as Excel files contain loads of metadata, the files will probably never be identical byte-for-byte, as James Burgess pointed out. So you'll need another kind of comparison for your test.
然而,正如 James Burgess 指出的那样,由于 Excel 文件包含大量元数据,因此这些文件可能永远不会完全相同。所以你的测试需要另一种比较。
I'd recommend somehow generating a "canonical" form from the Excel file, i.e. reading the generated Excel file and converting it to a simpler format (CSV or something similar), which will only retain the information you want to check. Then you can use the "canonical form" to compare with your expected result (also in canonical form, of course).
我建议以某种方式从 Excel 文件生成“规范”表单,即读取生成的 Excel 文件并将其转换为更简单的格式(CSV 或类似格式),这样只会保留您要检查的信息。然后您可以使用“规范形式”与您的预期结果进行比较(当然也是规范形式)。
Apache POImight be useful for reading the file.
Apache POI可能对读取文件很有用。
BTW: Reading a whole file to check its correctnes would generally not be considere a Unit test. That's an integration test...
顺便说一句:读取整个文件以检查其正确性通常不会被视为单元测试。这是一个集成测试...
回答by Tiger
Please, take a look at the siteto compare the binary files, http://www.velocityreviews.com/forums/t123770-re-java-code-for-determining-binary-file-equality.html
请查看该站点以比较二进制文件,http://www.velocityreviews.com/forums/t123770-re-java-code-for-determining-binary-file-equality.html
Tiger
老虎
回答by Wernight
You may use Beyond Compare 3which can be started from command-line and supports different ways to compare Excel files, including:
您可以使用Beyond Compare 3,它可以从命令行启动并支持不同的 Excel 文件比较方式,包括:
- Comparing Excel sheets as database tables
- Checking all textual content
- Checking textual content with some formating
- 将 Excel 工作表与数据库表进行比较
- 检查所有文本内容
- 用一些格式检查文本内容
回答by Toby
You might consider using my project simple-excelwhich provides a bunch of Hamcrest Matchers to do the job.
您可能会考虑使用我的项目simple-excel,它提供了一堆 Hamcrest Matchers 来完成这项工作。
When you do something like the following,
当您执行以下操作时,
assertThat(actual, WorkbookMatcher.sameWorkbook(expected));
You'd see, for example,
你会看到,例如,
java.lang.AssertionError:
Expected: entire workbook to be equal
but: cell at "C14" contained <"bananas"> expected <nothing>,
cell at "C15" contained <"1,850,000 EUR"> expected <"1,850,000.00 EUR">,
cell at "D16" contained <nothing> expected <"Tue Sep 04 06:30:00">
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
That way, you can run it from your automatted tests and get meaningful feedback whilst you're developing.
这样,您就可以从自动化测试中运行它,并在开发过程中获得有意义的反馈。
You can read more about it at this article on my site
回答by joshden
I needed to do something similar and was already using the Apache POI libraryin my project to create Excel files. So I opted to use the included ExcelExtractorinterface to export both workbooks as a string of text and asserted that the strings were equal. There are implementations for both HSSF for .xlsas well as XSSF for .xlsx.
我需要做一些类似的事情,并且已经在我的项目中使用Apache POI 库来创建 Excel 文件。所以我选择使用包含的ExcelExtractor接口将两个工作簿导出为文本字符串,并断言这些字符串是相等的。有这两种方案的HSSF为.xls的以及XSSF为的.xlsx。
Dump to string:
转储到字符串:
XSSFWorkbook xssfWorkbookA = ...;
String workbookA = new XSSFExcelExtractor(xssfWorkbookA).getText();
ExcelExtractor has some options for what all should be included in the string dump. I found it to have useful defaults of including sheet names. In addition it includes the text contents of the cells.
ExcelExtractor 有一些选项,用于确定所有应包含在字符串转储中的内容。我发现它具有包含工作表名称的有用默认值。此外,它还包括单元格的文本内容。
回答by BuckBazooka
The easiest way I find is to use Tika. I use it like this:
我发现最简单的方法是使用 Tika。我像这样使用它:
private void compareXlsx(File expected, File result) throws IOException, TikaException {
Tika tika = new Tika();
String expectedText = tika.parseToString(expected);
String resultText = tika.parseToString(result);
assertEquals(expectedText, resultText);
}
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.13</version>
<scope>test</scope>
</dependency>