php 将PDF转换为字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4780697/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 14:20:49  来源:igfitidea点击:

Converting PDF to string

phppdffile-conversion

提问by lolalola

How read PDF file and put content into string? Using PHP language.

如何读取PDF文件并将内容放入字符串?使用PHP语言。

回答by Matthew Smith

You could use something like pdftotext which comes with the Xpdf package on linux. The popen command can then be used to pipe the output of pdftotext into a string:

您可以使用 pdftotext 之类的东西,它随 linux 上的 Xpdf 包一起提供。然后可以使用 popen 命令将 pdftotext 的输出通过管道传输到字符串中:

$mystring = "";
$fd = popen("/usr/bin/pdftotext blah.pdf","r");
if ($fd) {
    while (($myline = fgets($fd)) !== false) {
        $mystring .= $myline;
    }
}

回答by advanced_noob

Found this really nice class! Further, you can add functionality to fit your needs.

发现这门课真的很好!此外,您可以添加功能以满足您的需求。

Probably these will help you to add functionality:

可能这些将帮助您添加功能:

If it doesn't work, check if you can highlight/mark your text when opening in Adobe Reader (if you can't, the text in your file is probably saved as geometric curves), check also for the encoding.

如果它不起作用,请检查在 Adob​​e Reader 中打开时是否可以突出显示/标记文本(如果不能,则文件中的文本可能已保存为几何曲线),还要检查编码。

回答by kentusaq

Install APACHE-TIKA on your server. APACHE-TIKA support more then pdf files. Install guide: http://www.acquia.com/blog/use-apache-solr-search-files

在您的服务器上安装 APACHE-TIKA。APACHE-TIKA 支持更多的 pdf 文件。安装指南:http: //www.acquia.com/blog/use-apache-solr-search-files

and final code is easy:

最终代码很简单:

$string = "";
$fd = popen("java -jar yourpathtotika/tika-app-1.3.jar -t yourpathtopdf/sample.pdf","r");
while (!feof($fd)) { 
$buffer = fgets($fd, 4096); 
$string .= $buffer;
}
echo $string;

回答by Christian Vigh

You can use the PHP class that is available here :

您可以使用此处提供的 PHP 类:

http://www.pdftotext.eu

http://www.pdftotext.eu

This is a public domain PDF text extractor entirely written in pure PHP, meaning that you do not need to rely on external commands. It provides a simple interface to retrieve text :

这是一个完全用纯 PHP 编写的公共领域 PDF 文本提取器,这意味着您不需要依赖外部命令。它提供了一个简单的界面来检索文本:

include ( 'PdfToText.phpclass' ) ;
$pdf = new PdfToText ( 'mysample.pdf' ) ;
echo "PDF contents are : " . $pdf -> Text . "\n" ;