将 PDF 转换为 DOC(Python/Bash)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26358281/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert PDF to DOC (Python/Bash)
提问by AlvaroAV
采纳答案by ham-sandwich
If you have LibreOffice installed
如果您安装了 LibreOffice
lowriter --invisible --convert-to doc '/your/file.pdf'
If you want to use Python for this:
如果您想为此使用 Python:
import os
import subprocess
for top, dirs, files in os.walk('/my/pdf/folder'):
for filename in files:
if filename.endswith('.pdf'):
abspath = os.path.join(top, filename)
subprocess.call('lowriter --invisible --convert-to doc "{}"'
.format(abspath), shell=True)
回答by ham-sandwich
This is difficult because PDFs are presentation oriented and word documents are content oriented. I have tested both and can recommend the following projects.
这很困难,因为 PDF 是面向演示的,而 Word 文档是面向内容的。我已经测试了两者,可以推荐以下项目。
However, you are most definitely going to lose presentational aspects in the conversion.
但是,您肯定会在转换中丢失表现方面。
回答by Tilal Ahmad
You can use GroupDocs.Conversion Cloud SDK for pythonwithout installing any third-party tool or software.
您可以使用GroupDocs.Conversion Cloud SDK for python无需安装任何第三方工具或软件。
Sample Python code:
示例 Python 代码:
# Import module
import groupdocs_conversion_cloud
# Get your app_sid and app_key at https://dashboard.groupdocs.cloud (free registration is required).
app_sid = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key)
file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key)
try:
#upload soruce file to storage
filename = 'Sample.pdf'
remote_name = 'Sample.pdf'
output_name= 'sample.docx'
strformat='docx'
request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename)
response_upload = file_api.upload_file(request_upload)
#Convert PDF to Word document
settings = groupdocs_conversion_cloud.ConvertSettings()
settings.file_path =remote_name
settings.format = strformat
settings.output_path = output_name
loadOptions = groupdocs_conversion_cloud.PdfLoadOptions()
loadOptions.hide_pdf_annotations = True
loadOptions.remove_embedded_files = False
loadOptions.flatten_all_fields = True
settings.load_options = loadOptions
convertOptions = groupdocs_conversion_cloud.DocxConvertOptions()
convertOptions.from_page = 1
convertOptions.pages_count = 1
settings.convert_options = convertOptions
.
request = groupdocs_conversion_cloud.ConvertDocumentRequest(settings)
response = convert_api.convert_document(request)
print("Document converted successfully: " + str(response))
except groupdocs_conversion_cloud.ApiException as e:
print("Exception when calling get_supported_conversion_types: {0}".format(e.message))
I'm developer evangelist at aspose.
我是 aspose 的开发人员布道者。
回答by eleks007
If you want to convert PDF -> MS Word type file like docx, I came across this.
如果你想转换 PDF -> MS Word 类型的文件,比如 docx,我遇到了这个.
Ahsin Shabbirwrote:
阿辛·沙比尔写道:
import glob
import win32com.client
import os
word = win32com.client.Dispatch("Word.Application")
word.visible = 0
pdfs_path = "" # folder where the .pdf files are stored
for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):
print(doc)
filename = doc.split('\')[-1]
in_file = os.path.abspath(doc)
print(in_file)
wb = word.Documents.Open(in_file)
out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i))
print("outfile\n",out_file)
wb.SaveAs2(out_file, FileFormat=16) # file format for docx
print("success...")
wb.Close()
word.Quit()
This worked like a charm for me, converted 500 pages PDF with formatting and images.
这对我来说就像一个魅力,转换了 500 页的 PDF 格式和图像。

