php 如何从PDF文档中提取文本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6999889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract text from the PDF document?
提问by Sfisioza
How to extract text from the PDF document using PHP?
如何使用 PHP从 PDF 文档中提取文本?
(I can't use other tools, I don't have root access)
(我不能使用其他工具,我没有root权限)
I've found some functions working for plain text, but they don't handle well Unicode characters:
我发现一些函数适用于纯文本,但它们不能很好地处理 Unicode 字符:
http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html
http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html
采纳答案by CONvid19
Download the class.pdf2text.php@ https://pastebin.com/dvwySU1aor http://www.phpclasses.org/browse/file/31030.html(Registration required)
下载class.pdf2text.php@ https://pastebin.com/dvwySU1a或http://www.phpclasses.org/browse/file/31030.html(需要注册)
Code:
代码:
include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('filename.pdf');
$a->decodePDF();
echo $a->output();
class.pdf2text.php
Project Homepdf2textclass
doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser
class.pdf2text.php
项目首页pdf2textclass
不适用于我测试过的所有 PDF,如果它不适合您,请尝试PDF Parser
回答by Sebastien Malot
I know that this topic is quite old, but this need is still alive. I read many documents, forum and script and build a new advanced one which supports compressed and uncompressed pdf :
我知道这个话题已经很老了,但这种需求仍然存在。我阅读了许多文档、论坛和脚本,并构建了一个支持压缩和未压缩 pdf 的新高级文档:
https://gist.github.com/smalot/6183152
https://gist.github.com/smalot/6183152
Hope it helps everone
希望对大家有帮助