我如何在python中阅读pdf?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45795089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:17:24  来源:igfitidea点击:

How can i read pdf in python?

pythonpython-2.7pdftext-extraction

提问by sg1994

How can i read pdf in python?I know one way of converting it to text, but i want to read the content directly from pdf.

我如何在python中阅读pdf?我知道一种将其转换为文本的方法,但我想直接从 pdf 阅读内容。

Can anyone explain which module in python is best for pdf extraction

谁能解释python中哪个模块最适合pdf提取

回答by shankarj67

You can USE PyPDF2 package

你可以使用 PyPDF2 包

#install pyDF2
pip install PyPDF2

# importing all the required modules
import PyPDF2

# creating an object 
file = open('example.pdf', 'rb')

# creating a pdf reader object
fileReader = PyPDF2.PdfFileReader(file)

# print the number of pages in pdf file
print(fileReader.numPages)

Follow this Documentation http://pythonhosted.org/PyPDF2/

按照此文档http://pythonhosted.org/PyPDF2/

回答by wanderweeer

Try PyPDF2.

试试 PyPDF2。

There is a good tutorial here: https://automatetheboringstuff.com/chapter13/

这里有一个很好的教程:https: //automatetheboringstuff.com/chapter13/

回答by Kallz

You can use textract module in python

您可以在python中使用textract模块

Textract

文本合同

for install

用于安装

pip install textract

for read pdf

阅读pdf

import textract
text = textract.process('path/to/pdf/file', method='pdfminer')

For detail Textract

详细信息