使用 Java 读取 MS Word 2007

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5332813/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 10:41:44  来源:igfitidea点击:

Reading MS Word 2007 using Java

java

提问by Wannabewoods

I am trying to read a Microsoft word file through Java. I have included all the .jar files from Apache poi-3.8-beta1 to my classpath. However, when I try running this, I get the following exception:

我正在尝试通过 Java 读取 Microsoft Word 文件。我已将 Apache poi-3.8-beta1 中的所有 .jar 文件包含到我的类路径中。但是,当我尝试运行它时,出现以下异常:

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
        at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
        at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
        at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
        at readingmsword07.Main.main(Main.java:27)

Following is my code:

以下是我的代码:

import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.*;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.xwpf.usermodel.XWPFDocument;


public class Main {


    public static void main(String[] args) {
        try {
            FileInputStream fis = new FileInputStream("C:\TrialDoc.docx");
            POIFSFileSystem fileSystem = new POIFSFileSystem(fis);            
            org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor =
            new XWPFWordExtractor(new XWPFDocument(fis));
            System.out.print(oleTextExtractor.getText());            
        } catch (Exception e) {
                e.printStackTrace();
        }
    }

}

I am using the XWPFWordExtractor since I am trying to read a 2007 word document but for some reason I am unable to figure out the right POI that deals with this.

我正在使用 XWPFWordExtractor,因为我正在尝试阅读 2007 年的 word 文档,但由于某种原因,我无法找出处理此问题的正确 POI。

Any help is much appreciated. Thanks in advance!

任何帮助深表感谢。提前致谢!

~ Woods

~伍兹

回答by sbridges

remove the line,

删除线,

POIFSFileSystem fileSystem = new POIFSFileSystem(fis);