Java POI - 错误:无法读取整个标题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17144669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java POI - Error: Unable to read entire header
提问by shanks_roux
I'm trying to read a .doc file with java through the POI library. Here is my code:
我正在尝试通过 POI 库使用 java 读取 .doc 文件。这是我的代码:
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
HWPFDocument document = new HWPFDocument(fis);
WordExtractor extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
And I have this exception:
我有这个例外:
java.io.IOException: Unable to read entire header; 162 bytes read; expected 512 bytes
at org.apache.poi.poifs.storage.HeaderBlock.alertShortRead(HeaderBlock.java:226)
at org.apache.poi.poifs.storage.HeaderBlock.readFirst512(HeaderBlock.java:207)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at MicrosoftWordParser.getDocString(MicrosoftWordParser.java:277)
at MicrosoftWordParser.main(MicrosoftWordParser.java:86)
My file is not corrupted, i can launch it with microsoft Word.
我的文件没有损坏,我可以用 Microsoft Word 启动它。
I'm using poi 3.9 (the latest stable version).
我正在使用 poi 3.9(最新的稳定版本)。
Do you have an idea t solve the problem ?
你有解决问题的想法吗?
Thank you.
谢谢你。
回答by NINCOMPOOP
readFirst512()
will read the first 512 bytes of your Inputstream
and throw an exception if there is not enough bytes to read. I think your file is not big enough to be read by POI.
readFirst512()
将读取您的前 512 个字节Inputstream
并在没有足够字节读取时抛出异常。我认为您的文件不够大,无法被 POI 读取。
回答by Calabacin
It is probably not a correct Word file. Is it really only 162 bytes long? Check in your filesystem.
它可能不是正确的 Word 文件。它真的只有 162 字节长吗?检查您的文件系统。
I'd recommend creating a new Word file using Word or LibreOffice, and then try to read it using your program.
我建议使用 Word 或 LibreOffice 创建一个新的 Word 文件,然后尝试使用您的程序阅读它。
回答by milan.s
you should try this programm. package file_opration;
你应该试试这个程序。包 file_opration;
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class ReadDocFile {
public static void main(String[] args) {
File file = null;
WordExtractor extractor = null ;
try {
file = new File("filepath location");
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
for(int i=0;i<fileData.length;i++){
if(fileData[i] != null)
System.out.println(fileData[i]);
}
}
catch(Exception exep){}
}
}
回答by Gagravarr
Ahh, you've got a file, then you're spending loads of memory buffering the whole thing into memory by hiding your file behind an InputStream... Don't! If you have a File, give that to POI. Only give POI an InputStream if that's all your have
啊,您有一个文件,然后通过将您的文件隐藏在 InputStream 后面,您将花费大量内存将整个内容缓冲到内存中......不要!如果您有文件,请将其交给 POI。如果这是你所有的,只给 POI 一个 InputStream
Your code should be something like:
你的代码应该是这样的:
NPOIFSFileSystem fs = new NPOIFSFileSystem(new File("myfile.doc"));
HWPFDocument document = new HWPFDocument(fs.getRoot());
That'll be quicker and use less memory that reading it into an InputStream, and if there are problems with the file you should normally get slightly more helpful error messages out too
与将其读入 InputStream 相比,这会更快并使用更少的内存,并且如果文件有问题,您通常也应该得到更有用的错误消息
回答by babelchips
A 162 byte MS Word .doc is probably an "owner file". A temporary file that Word uses to signify the file is locked/owned.
一个 162 字节的 MS Word .doc 可能是一个“所有者文件”。Word 用来表示文件已锁定/拥有的临时文件。
They have a .doc file extension but they are not MS Word Docs.
它们具有 .doc 文件扩展名,但它们不是 MS Word Docs。