将 PDF 转换为 DOC（Python/Bash）

Question

提问by AlvaroAV

I've saw some pages that allow user to upload PDFand returns a DOCfile, like PdfToWord

我看到了一些允许用户上传PDF并返回DOC文件的页面，比如PdfToWord

Is there any way to convert a PDFfile to a DOC/DOCXfile using Python or any Unix command ?

有没有办法使用 Python 或任何 Unix 命令将PDF文件转换为文件DOC/DOCX？

Thanks in advance

提前致谢

Answer 1

采纳答案by ham-sandwich

If you have LibreOffice installed

如果您安装了 LibreOffice

lowriter --invisible --convert-to doc '/your/file.pdf'

If you want to use Python for this:

如果您想为此使用 Python：

import os
import subprocess

for top, dirs, files in os.walk('/my/pdf/folder'):
    for filename in files:
        if filename.endswith('.pdf'):
            abspath = os.path.join(top, filename)
            subprocess.call('lowriter --invisible --convert-to doc "{}"'
                            .format(abspath), shell=True)

Answer 2

回答by ham-sandwich

This is difficult because PDFs are presentation oriented and word documents are content oriented. I have tested both and can recommend the following projects.

这很困难，因为 PDF 是面向演示的，而 Word 文档是面向内容的。我已经测试了两者，可以推荐以下项目。

However, you are most definitely going to lose presentational aspects in the conversion.

但是，您肯定会在转换中丢失表现方面。

Answer 3

回答by Tilal Ahmad

You can use GroupDocs.Conversion Cloud SDK for pythonwithout installing any third-party tool or software.

您可以使用GroupDocs.Conversion Cloud SDK for python无需安装任何第三方工具或软件。

Sample Python code:

示例 Python 代码：

# Import module
import groupdocs_conversion_cloud

# Get your app_sid and app_key at https://dashboard.groupdocs.cloud (free registration is required).
app_sid = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key)
file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key)

try:

        #upload soruce file to storage
        filename = 'Sample.pdf'
        remote_name = 'Sample.pdf'
        output_name= 'sample.docx'
        strformat='docx'

        request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename)
        response_upload = file_api.upload_file(request_upload)
        #Convert PDF to Word document
        settings = groupdocs_conversion_cloud.ConvertSettings()
        settings.file_path =remote_name
        settings.format = strformat
        settings.output_path = output_name

        loadOptions = groupdocs_conversion_cloud.PdfLoadOptions()
        loadOptions.hide_pdf_annotations = True
        loadOptions.remove_embedded_files = False
        loadOptions.flatten_all_fields = True

        settings.load_options = loadOptions

        convertOptions = groupdocs_conversion_cloud.DocxConvertOptions()
        convertOptions.from_page = 1
        convertOptions.pages_count = 1

        settings.convert_options = convertOptions
 .               
        request = groupdocs_conversion_cloud.ConvertDocumentRequest(settings)
        response = convert_api.convert_document(request)

        print("Document converted successfully: " + str(response))
except groupdocs_conversion_cloud.ApiException as e:
        print("Exception when calling get_supported_conversion_types: {0}".format(e.message))

I'm developer evangelist at aspose.

我是 aspose 的开发人员布道者。

Answer 4

回答by eleks007

If you want to convert PDF -> MS Word type file like docx, I came across this.

如果你想转换 PDF -> MS Word 类型的文件，比如 docx，我遇到了这个.

Ahsin Shabbirwrote:

阿辛·沙比尔写道：

import glob
import win32com.client
import os

word = win32com.client.Dispatch("Word.Application")
word.visible = 0

pdfs_path = "" # folder where the .pdf files are stored
for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):
    print(doc)
    filename = doc.split('\')[-1]
    in_file = os.path.abspath(doc)
    print(in_file)
    wb = word.Documents.Open(in_file)
    out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i))
    print("outfile\n",out_file)
    wb.SaveAs2(out_file, FileFormat=16) # file format for docx
    print("success...")
    wb.Close()

word.Quit()

This worked like a charm for me, converted 500 pages PDF with formatting and images.

这对我来说就像一个魅力，转换了 500 页的 PDF 格式和图像。

将 PDF 转换为 DOC（Python/Bash）

提问by AlvaroAV

采纳答案by ham-sandwich

回答by ham-sandwich

回答by Tilal Ahmad

回答by eleks007

相关推荐

最近更新

标签

将 PDF 转换为 DOC（Python/Bash）

提问by AlvaroAV

采纳答案by ham-sandwich

回答by ham-sandwich

回答by Tilal Ahmad

回答by eleks007

相关推荐

使用python在open cv中使用鼠标事件绘制矩形或线条

Python 在 Windows 中通过 Anaconda 安装 NumPy

如何像 R 一样在 Python scikit 中获得回归摘要？

通过python在mysql中插入和检索图像

相关推荐

最近更新

标签