如何使用 Java 加载旧的 Microsoft Office XML 文件 (Excel)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7089039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 18:34:11  来源:igfitidea点击:

How to load old Microsoft Office XML file (Excel) using Java

javaxmlexcelms-office

提问by drewxmlss

I'm not able to load an Excel file in the older Office XML format (think Office 2002 or 2003 version) into Java. I tried JXL and Apache's POI (version 3.7). POI doesn't work since it appears to want the newer Office .xlsxformat.

我无法将旧版 Office XML 格式(例如 Office 2002 或 2003 版本)的 Excel 文件加载到 Java 中。我尝试了 JXL 和 Apache 的 POI(3.7 版)。POI 不起作用,因为它似乎需要更新的 Office.xlsx格式。

Here's an exampleof the older Office XML format.

下面是旧版 Office XML 格式的示例

One can generate a similar XML file from MS Excel 2010 by saving the workbook as the format "XML Spreadsheet 2003"?

通过将工作簿保存为“XML Spreadsheet 2003”格式,可以从 MS Excel 2010 生成类似的 XML 文件吗?

Are there any open-source Java libraries that will load the XMLSS format? Otherwise I have no choice but to write a custom parser: read the XML file then interpret the cell tags to build out the cell matrix. In this XML format, any rows with empty cell values are skipped, the next cell with data positioned with an index attribute that acts like an offset in the columns, I assume to save space in the XML file.

是否有任何开源 Java 库可以加载 XMLSS 格式?否则我别无选择,只能编写自定义解析器:读取 XML 文件,然后解释单元格标签以构建单元格矩阵。在这种 XML 格式中,任何具有空单元格值的行都将被跳过,下一个单元格的数据使用索引属性定位,该属性的作用类似于列中的偏移量,我假设是为了节省 XML 文件中的空间。

回答by Ilya Kharlamov

The format is called SpreadsheetML (do not confuse with .xlsx which is also xml-based), a library called Xelemcan handle it:

该格式称为 SpreadsheetML(不要与也是基于 xml 的 .xlsx 混淆),一个名为Xelem的库可以处理它:

import nl.fountain.xelem.excel.Workbook;
import nl.fountain.xelem.lex.ExcelReader;
//...
ExcelReader reader = new ExcelReader();
Workbook xlWorkbook = reader.getWorkbook("c:\my\spreadsheet.xml");
System.out.println(xlWorkbook.getSheetNames());

回答by Andrey

Copying Mark Beardsley's answer from POI team http://apache-poi.1045710.n5.nabble.com/How-to-convert-xml-to-xls-td2306602.html:

从 POI 团队http://apache-poi.1045710.n5.nabble.com/How-to-convert-xml-to-xls-td2306602.html复制 Mark Beardsley 的回答:

You have got an Office 2003 xml file there, not an OpenXML file; it is an early attempt by Microsoft to create an xml based file format for Excel and it is in that sense a 'valid' Office file format.

那里有一个 Office 2003 xml 文件,而不是 OpenXML 文件;这是 Microsoft 为 Excel 创建基于 xml 的文件格式的早期尝试,从这个意义上说,它是一种“有效”的 Office 文件格式。

Sadly, POI cannot interpret this file at all and that is why you saw the exception when you tried to wrap it up in the InputStream and pass it to WorkbookFactory(s) constructor. You do however have a number of options;

遗憾的是,POI 根本无法解释这个文件,这就是为什么当您尝试将它包装在 InputStream 中并将其传递给 WorkbookFactory(s) 构造函数时看到异常的原因。但是,您有多种选择;

  • You could use Excel itself and manually open and save each file you wish to convert, as you already have done.
  • If you have access to Visual Studio and can write Visual Basic or C# code then you could use a control that will allow you to control Excel programmatically. This way you could automate a file conversion process using Excel itself. Then once the file has been converted wither to the binary or OpenXML formats, POI can be used to process it.
  • If you are running on a stand alone PC on which a copy of Excel is installed and using the Windows operating system, then you could use OLE to do something very similar from Java code. As above, POI can be used to process the file following the conversion.
  • If you have access to OpenOffice, it has a rather good API that is accessible from Java code. You could use it to convert between the file types for you - it is simply a matter of discovering the correct filter to use in this case. OpenOffice is good for all except the most complex files and you should be able to use POI to process the file following conversion. However, if you choose this route, it may be best to do all of the work using OpenOffice's UNO api.
  • Depending upon what you want to do with the file's contents, you could create your own parser using core java code and either the SAX or Xerces parsers (consider using xmlBeans (http://xmlbeans.apache.org/) ). If you simply open the original xml file using a simple text editor, you can see that the structure is not complex and, if all you wish to get at is the raw data it contains, this could be your best option.
  • 您可以使用 Excel 本身并手动打开并保存您希望转换的每个文件,就像您已经完成的那样。
  • 如果您有权访问 Visual Studio 并且可以编写 Visual Basic 或 C# 代码,那么您可以使用允许您以编程方式控制 Excel 的控件。通过这种方式,您可以使用 Excel 本身自动执行文件转换过程。然后,一旦文件被转换为二进制或 OpenXML 格式,就可以使用 POI 来处理它。
  • 如果您在安装了 Excel 副本并使用 Windows 操作系统的独立 PC 上运行,那么您可以使用 OLE 执行与 Java 代码非常相似的操作。如上所述,POI 可用于处理转换后的文件。
  • 如果您可以访问 OpenOffice,它有一个相当不错的 API,可以从 Java 代码访问。您可以使用它为您在文件类型之间进行转换 - 只需找到在这种情况下使用的正确过滤器即可。OpenOffice 适用于除最复杂的文件之外的所有文件,您应该能够在转换后使用 POI 来处理文件。但是,如果您选择这条路线,最好使用 OpenOffice 的 UNO api 完成所有工作。
  • 根据您想对文件内容做什么,您可以使用核心 java 代码和 SAX 或 Xerces 解析器(考虑使用 xmlBeans (http://xmlbeans.apache.org/))创建自己的解析器。如果您只是使用简单的文本编辑器打开原始 xml 文件,您会看到结构并不复杂,如果您只想了解其中包含的原始数据,这可能是您的最佳选择。

回答by michaelt

After a lot of pain I've found a solution to this. JODConverteruses the OpenOffice.org/LibreOffice API and can convert SpreadsheetML to whatever formats OpenOffice.org suppports.

在经历了很多痛苦之后,我找到了解决方案。JODConverter使用 OpenOffice.org/LibreOffice API,可以将 SpreadsheetML 转换为 OpenOffice.org 支持的任何格式。

回答by Gus

I had the same problem some time ago, ended up writing a SAX parser to read the XML file. I wrote a blog post about it here.

前段时间我遇到了同样的问题,最终编写了一个 SAX 解析器来读取 XML 文件。我在这里写了一篇关于它的博客文章。

You can find the sample project to parse the file in Github.

您可以在Github 中找到用于解析文件的示例项目。

回答by drewxmlss

The answer today was to ask the vendor to change their Excel file format to an Excel binary rather than the old Office XML. Doing so allowed me to use Apache POI 3.7 to read the file with no issues. I appreciate the answers, as I had no idea there was no direct support in the Java-based open source libraries for this old Office XML format. Now I know next time to check earlier to see what format the Excel files are in before committing to a timeline.

今天的答案是要求供应商将他们的 Excel 文件格式更改为 Excel 二进制文件而不是旧的 Office XML。这样做使我可以使用 Apache POI 3.7 毫无问题地读取文件。我很欣赏这些答案,因为我不知道在基于 Java 的开源库中没有直接支持这种旧的 Office XML 格式。现在我知道下次在提交到时间线之前先检查一下 Excel 文件的格式。

回答by Wivani

You might get some result using the OpenOffice API. If not directly you could probably convert to a 'supported' format. Otherwise the schema for the Office 2003 'SpreadsheetML' isn't very complicated. I have succesfully created an xslt scenario to convert a resultset (database query) to a (simple yet effective) Excel 2003 document (XML format). The other way around should not be very hard to achieve.

使用 OpenOffice API 可能会得到一些结果。如果不是直接,您可能可以转换为“受支持”的格式。否则,Office 2003 'SpreadsheetML' 的架构并不是很复杂。我已经成功地创建了一个 xslt 场景来将结果集(数据库查询)转换为(简单而有效的)Excel 2003 文档(XML 格式)。相反的方式应该不难实现。

Cheers, Wim

干杯,维姆