我如何在python中阅读pdf？

Question

提问by sg1994

How can i read pdf in python?I know one way of converting it to text, but i want to read the content directly from pdf.

我如何在python中阅读pdf？我知道一种将其转换为文本的方法，但我想直接从 pdf 阅读内容。

Can anyone explain which module in python is best for pdf extraction

谁能解释python中哪个模块最适合pdf提取

Answer 1

回答by shankarj67

You can USE PyPDF2 package

你可以使用 PyPDF2 包

#install pyDF2
pip install PyPDF2

# importing all the required modules
import PyPDF2

# creating an object 
file = open('example.pdf', 'rb')

# creating a pdf reader object
fileReader = PyPDF2.PdfFileReader(file)

# print the number of pages in pdf file
print(fileReader.numPages)

Follow this Documentation http://pythonhosted.org/PyPDF2/

按照此文档http://pythonhosted.org/PyPDF2/

Answer 2

回答by wanderweeer

Try PyPDF2.

试试 PyPDF2。

There is a good tutorial here: https://automatetheboringstuff.com/chapter13/

这里有一个很好的教程：https: //automatetheboringstuff.com/chapter13/

Answer 3

回答by Kallz

You can use textract module in python

您可以在python中使用textract模块

Textract

文本合同

for install

用于安装

pip install textract

for read pdf

阅读pdf

import textract
text = textract.process('path/to/pdf/file', method='pdfminer')

For detail Textract

详细信息

我如何在python中阅读pdf？

提问by sg1994

回答by shankarj67

回答by wanderweeer

回答by Kallz

相关推荐

最近更新

标签

我如何在python中阅读pdf？

提问by sg1994

回答by shankarj67

回答by wanderweeer

回答by Kallz

相关推荐

将 Excel 列中的数据读入 Python 列表

Python AttributeError: 'numpy.ndarray' 对象没有属性 'columns'

Python 如何在 matplotlib 图中更改 xticks 字体大小

将毫秒转换为小时、分钟和秒 python

相关推荐

最近更新

标签