使用 Java 将 DOC 文件转换为 DOCX

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6664728/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 16:51:50  来源:igfitidea点击:

Convert DOC file to DOCX with Java

javams-officedocxdoc

提问by 3rgo

I need to use DOCX files (actually the XML contained in them) in a Java software I'm currently developing, but some people in my company still use the DOC format.

我需要在我目前正在开发的 Java 软件中使用 DOCX 文件(实际上是其中包含的 XML),但我公司中的一些人仍然使用 DOC 格式。

Do you know if there is a way to convert a DOC file to the DOCX format using Java ? I know it's possible using C#, but that's not an option

您知道是否有办法使用 Java 将 DOC 文件转换为 DOCX 格式?我知道可以使用 C#,但这不是一个选项

I googled it, but nothing came up...

我用谷歌搜索了它,但什么也没有出现......

Thanks

谢谢

回答by Shahzad Latif

You may try Aspose.Words for Java. It allows you to load a DOC fileand save it as DOCX format. The code is very simple as shown below:

你可以试试Aspose.Words for Java。它允许您加载 DOC 文件并将其保存为 DOCX 格式。代码非常简单,如下所示:

// Open a document.  
Document doc = new Document("input.doc"); 
// Save document. 
doc.save("output.docx");

Please see if this helps in your scenario.

请看看这是否对您的场景有帮助。

Disclosure: I work as developer evangelist at Aspose.

披露:我在 Aspose 担任开发人员布道者。

回答by helios

Check out JODConverterto see if it fits the bill. I haven't personally used it.

查看JODConverter,看看它是否符合要求。我个人没有使用过。

回答by Cin Sb Sangpi

To convert DOC file to HTML look at this (Convert Word doc to HTML programmatically in Java)

要将 DOC 文件转换为 HTML,请查看此(在 Java 中以编程方式将 Word 文档转换为 HTML

Use this: http://poi.apache.org/

使用这个:http: //poi.apache.org/

Or use this :

或者使用这个:

XWPFDocument docx = new XWPFDocument(OPCPackage.openOrCreate(new File("hello.docx")));  
XWPFWordExtractor wx = new XWPFWordExtractor(docx);  
String text = wx.getText();  
System.out.println("text = "+text); 

回答by Ulay

Use newer versions of jars jodconverter-core-4.2.2.jarand jodconverter-local-4.2.2.jar

使用较新版本的 jarsjodconverter-core-4.2.2.jarjodconverter-local-4.2.2.jar

String inputFile = "*.doc";
String outputFile = "*.docx";

LocalOfficeManager localOfficeManager = LocalOfficeManager.builder()
            .install()
            .officeHome(getDefaultOfficeHome()) //your path to openoffice
            .build();

  try {
      localOfficeManager.start();
      final DocumentFormat format
              = DocumentFormat.builder()
                      .from(DefaultDocumentFormatRegistry.DOCX)
                      .build();

      LocalConverter
              .make()
              .convert(new FileInputStream(new File(inputFile)))
              .as(DefaultDocumentFormatRegistry.getFormatByMediaType("application/msword"))
              .to(new File(outputFile))
              .as(format)
              .execute();

  } catch (OfficeException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } catch (FileNotFoundException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } finally {
      OfficeUtils.stopQuietly(localOfficeManager);
  }

回答by JFK

JODConvertor calls OpenOffice/LibreOffice via a network protocol. It can therefore 'do anything you can do in OpenOffice'. This includes converting formats. But it only does as good a job as whatever version of OpenOffice you are running. I have some art in one of my docs, and it doesn't convert them as I hoped.

JODConvertor 通过网络协议调用 OpenOffice/LibreOffice。因此,它可以“做任何你可以在 OpenOffice 中做的事情”。这包括转换格式。但它只能与您运行的任何版本的 OpenOffice 一样好。我的一个文档中有一些艺术作品,但它并没有像我希望的那样转换它们。

JODConvertor is no longer supported, according to the google code web site for v3.

根据 v3 的谷歌代码网站,不再支持 JODConvertor。

To get JOD to do the job you need to do something like

要让 JOD 完成这项工作,您需要执行以下操作

private static void transformBinaryWordDocToDocX(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
    DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
    docx.setStoreProperties(DocumentFamily.TEXT,
    Collections.singletonMap("FilterName", "MS Word 2007 XML"));

    converter.convert(in, out, docx);
}


private static void transformBinaryWordDocToW2003Xml(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);;
    DocumentFormat w2003xml = new DocumentFormat("Microsoft Word 2003 XML", "xml", "text/xml");
    w2003xml.setInputFamily(DocumentFamily.TEXT);
    w2003xml.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MS Word 2003 XML"));
    converter.convert(in, out, w2003xml);
}



private static OfficeManager officeManager;

@BeforeClass
public static void setupStatic() throws IOException {

          /*officeManager = new DefaultOfficeManagerConfiguration()
      .setOfficeHome("C:/Program Files/LibreOffice 3.6")
      .buildOfficeManager();
      */

    officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();


    officeManager.start();
}

@AfterClass
public static void shutdownStatic() throws IOException {

    officeManager.stop();
}

For this to work you need to be running LibreOffice as a networked server ( I could not get the 'run on demand' part of JODConvertor to work under windows with LO 3.6 very well )

为此,您需要将 LibreOffice 作为网络服务器运行(我无法让 JODConvertor 的“按需运行”部分在 LO 3.6 的 Windows 下工作得很好)

回答by Dinesh Parmar

I needed the same conversion ,after researching a lot found Jodconvertor can be useful in it , you can download the jar from https://code.google.com/p/jodconverter/downloads/list

我需要相同的转换,经过大量研究发现 Jodconvertor 在其中很有用,您可以从https://code.google.com/p/jodconverter/downloads/list下载 jar

Add jodconverter-core-3.0-beta-4-sources.jar file to your project lib

将 jodconverter-core-3.0-beta-4-sources.jar 文件添加到您的项目库中

  //1) Create OfficeManger Object     
OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
                .setOfficeHome(new File("/opt/libreoffice4.4"))
                .buildOfficeManager();
        officeManager.start();
    // 2) Create JODConverter converter   
        OfficeDocumentConverter converter = new OfficeDocumentConverter(
                officeManager);
// 3)Create DocumentFormat for docx
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
        docx.setStoreProperties(DocumentFamily.TEXT,
                Collections.singletonMap("FilterName", "MS Word 2007 XML"));
//4)Call convert funtion in converter object
converter.convert(new File("doc/AdvancedTable.doc"), new File(
                "docx/AdvancedTable.docx"), docx);

回答by Abhishek Mishra

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;


import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class TestCon {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        POIFSFileSystem fs = null;  
        Document document = new Document();

        try {  
            System.out.println("Starting the test");  
            fs = new POIFSFileSystem(new FileInputStream("C:/Users/312845/Desktop/a.doc"));  

            HWPFDocument doc = new HWPFDocument(fs);  
            WordExtractor we = new WordExtractor(doc);  

            OutputStream file = new FileOutputStream(new File("C:/Users/312845/Desktop/test.docx")); 

            System.out.println("Document testing completed");  
        } catch (Exception e) {  
            System.out.println("Exception during test");  
            e.printStackTrace();  
        } finally {  
            // close the document  
            document.close();  
        }  
    }  
}