php 如何从PDF文档中提取文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6999889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 01:47:02  来源:igfitidea点击:

How to extract text from the PDF document?

phppdftextunicode

提问by Sfisioza

How to extract text from the PDF document using PHP?

如何使用 PHP从 PDF 文档中提取文本?

(I can't use other tools, I don't have root access)

(我不能使用其他工具,我没有root权限)

I've found some functions working for plain text, but they don't handle well Unicode characters:

我发现一些函数适用于纯文本,但它们不能很好地处理 Unicode 字符:

http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html

http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html

采纳答案by CONvid19

Download the class.pdf2text.php@ https://pastebin.com/dvwySU1aor http://www.phpclasses.org/browse/file/31030.html(Registration required)

下载class.pdf2text.php@ https://pastebin.com/dvwySU1ahttp://www.phpclasses.org/browse/file/31030.html(需要注册)

Code:

代码:

include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('filename.pdf'); 
$a->decodePDF();
echo $a->output(); 


  • class.pdf2text.phpProject Home

  • pdf2textclassdoesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser

  • class.pdf2text.php项目首页

  • pdf2textclass不适用于我测试过的所有 PDF,如果它不适合您,请尝试PDF Parser



回答by Sebastien Malot

I know that this topic is quite old, but this need is still alive. I read many documents, forum and script and build a new advanced one which supports compressed and uncompressed pdf :

我知道这个话题已经很老了,但这种需求仍然存在。我阅读了许多文档、论坛和脚本,并构建了一个支持压缩和未压缩 pdf 的新高级文档:

https://gist.github.com/smalot/6183152

https://gist.github.com/smalot/6183152

Hope it helps everone

希望对大家有帮助