apache 在 Java 中打开 Microsoft Word
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/846157/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Open Microsoft Word in Java
提问by Saeed
I'm trying to open MS Word 2003 document in java, search for a specified String and replace it with a new String. I use APACHE POI to do that. My code is like the following one:
我正在尝试在 Java 中打开 MS Word 2003 文档,搜索指定的字符串并将其替换为新的字符串。我使用 APACHE POI 来做到这一点。我的代码如下所示:
public void searchAndReplace(String inputFilename, String outputFilename,
HashMap<String, String> replacements) {
File outputFile = null;
File inputFile = null;
FileInputStream fileIStream = null;
FileOutputStream fileOStream = null;
BufferedInputStream bufIStream = null;
BufferedOutputStream bufOStream = null;
POIFSFileSystem fileSystem = null;
HWPFDocument document = null;
Range docRange = null;
Paragraph paragraph = null;
CharacterRun charRun = null;
Set<String> keySet = null;
Iterator<String> keySetIterator = null;
int numParagraphs = 0;
int numCharRuns = 0;
String text = null;
String key = null;
String value = null;
try {
// Create an instance of the POIFSFileSystem class and
// attach it to the Word document using an InputStream.
inputFile = new File(inputFilename);
fileIStream = new FileInputStream(inputFile);
bufIStream = new BufferedInputStream(fileIStream);
fileSystem = new POIFSFileSystem(bufIStream);
document = new HWPFDocument(fileSystem);
docRange = document.getRange();
numParagraphs = docRange.numParagraphs();
keySet = replacements.keySet();
for (int i = 0; i < numParagraphs; i++) {
paragraph = docRange.getParagraph(i);
text = paragraph.text();
numCharRuns = paragraph.numCharacterRuns();
for (int j = 0; j < numCharRuns; j++) {
charRun = paragraph.getCharacterRun(j);
text = charRun.text();
System.out.println("Character Run text: " + text);
keySetIterator = keySet.iterator();
while (keySetIterator.hasNext()) {
key = keySetIterator.next();
if (text.contains(key)) {
value = replacements.get(key);
charRun.replaceText(key, value);
docRange = document.getRange();
paragraph = docRange.getParagraph(i);
charRun = paragraph.getCharacterRun(j);
text = charRun.text();
}
}
}
}
bufIStream.close();
bufIStream = null;
outputFile = new File(outputFilename);
fileOStream = new FileOutputStream(outputFile);
bufOStream = new BufferedOutputStream(fileOStream);
document.write(bufOStream);
} catch (Exception ex) {
System.out.println("Caught an: " + ex.getClass().getName());
System.out.println("Message: " + ex.getMessage());
System.out.println("Stacktrace follows.............");
ex.printStackTrace(System.out);
}
}
I call this function with following arguments:
我使用以下参数调用此函数:
HashMap<String, String> replacements = new HashMap<String, String>();
replacements.put("AAA", "BBB");
searchAndReplace("C:/Test.doc", "C:/Test1.doc", replacements);
When the Test.doc file contains a simple line like this : "AAA EEE", it works successfully, but when i use a complicated file it will read the content successfully and generate the Test1.doc file but when I try to open it, it will give me the following error:
当 Test.doc 文件包含像这样的简单行时:“ AAA EEE”,它可以成功运行,但是当我使用复杂的文件时,它将成功读取内容并生成 Test1.doc 文件,但是当我尝试打开它时,它会给我以下错误:
Word unable to read this document. It may be corrupt. Try one or more of the following: * Open and repair the file. * Open the file with Text Recovery converter. (C:\Test1.doc)
Word 无法阅读此文档。它可能是腐败的。尝试以下一种或多种方法: * 打开并修复文件。* 使用文本恢复转换器打开文件。(C:\Test1.doc)
Please tell me what to do, because I'm a beginner in POI and I have not found a good tutorial for it.
请告诉我该怎么做,因为我是 POI 的初学者,我还没有找到好的教程。
采纳答案by IAdapter
You could try OpenOffice API, but there arent many resources out there to tell you how to use it.
您可以尝试OpenOffice API,但是没有很多资源可以告诉您如何使用它。
回答by AlbertoPL
First of all you should be closing your document.
首先,您应该关闭文档。
Besides that, what I suggest doing is resaving your original Word document as a Word XML document, then changing the extension manually from .XML to .doc . Then look at the XML of the actual document you're working with and trace the content to make sure you're not accidentally editing hexadecimal values (AAA and EEE could be hex values in other fields).
除此之外,我建议做的是将您的原始 Word 文档重新保存为 Word XML 文档,然后手动将扩展名从 .XML 更改为 .doc 。然后查看您正在处理的实际文档的 XML 并跟踪内容以确保您不会意外编辑十六进制值(AAA 和 EEE 可能是其他字段中的十六进制值)。
Without seeing the actual Word document it's hard to say what's going on.
如果没有看到实际的 Word 文档,很难说出发生了什么。
There is not much documentation about POI at all, especially for Word document unfortunately.
根本没有太多关于 POI 的文档,尤其是不幸的 Word 文档。
回答by Saeed
I don't know : is its OK to answer myself, but Just to share the knowledge, I'll answer myself.
我不知道:可以回答自己,但只是为了分享知识,我会回答自己。
After navigating the web, the final solution i found is : The Library called docx4jis very good for dealing with MS docx file, although its documentation is not enough till now and its forum is still in a beginning steps, but overall it help me to do what i need..
浏览网页后,我找到的最终解决方案是:名为docx4j的库非常适合处理 MS docx 文件,虽然它的文档目前还不够,其论坛仍处于起步阶段,但总的来说它帮助我做我需要的..
Thanks 4 all who help me..
感谢所有帮助我的人..
回答by gusti
You can also try this one: http://www.dancrintea.ro/doc-to-pdf/
你也可以试试这个:http: //www.dancrintea.ro/doc-to-pdf/

