仅使用 PHP 计算 PDF 中的页数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1143841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count the number of pages in a PDF in only PHP
提问by UnkwnTech
I need a way to count the number of pages of a PDF in PHP. I've done a bit of Googling and the only things I've found either utilize shell/bash scripts, perl, or other languages, but I need something in native PHP. Are there any libraries or examples of how to do this?
我需要一种方法来计算 PHP 中 PDF 的页数。我已经做了一些谷歌搜索,我发现唯一的东西要么是利用 shell/bash 脚本、perl 或其他语言,但我需要一些原生 PHP 的东西。是否有任何库或示例说明如何执行此操作?
采纳答案by Travis Beale
You can use the ImageMagick extension for PHP. ImageMagick understands PDF's, and you can use the identifycommand to extract the number of pages. The PHP function is Imagick::identifyImage().
您可以使用 PHP 的 ImageMagick 扩展。ImageMagick 理解 PDF,您可以使用该identify命令提取页数。PHP 函数是Imagick::identifyImage()。
回答by stephangroen
If using Linux, this is much faster than using identifyto get the page count (especially with a high number of pages):
如果使用 Linux,这比使用identify获取页数要快得多(尤其是在页数较多的情况下):
exec('/usr/bin/pdfinfo '.$tmpfname.' | awk \'/Pages/ {print }\'', $output);
You do need pdfinfo installed.
您确实需要安装 pdfinfo。
回答by user678415
I know this is pretty old... but if it's relevant to me now, it can be relevant to others too.
我知道这已经很老了……但如果它现在与我相关,那么它也可能与其他人相关。
I just worked out this method of getting page numbers, as the methods listed here are inefficient and extremely slow for large PDFs.
我刚刚制定了这种获取页码的方法,因为此处列出的方法对于大型 PDF 而言效率低下且速度极慢。
$im = new Imagick();
$im->pingImage('name_of_pdf_file.pdf');
echo $im->getNumberImages();
Seems to be working great for me!
似乎对我很有用!
回答by adrianbj
I actually went with a combined approach. Since I have exec disabled on my server I wanted to stick with a PHP based solution, so ended up with this:
我实际上采用了组合方法。由于我在我的服务器上禁用了 exec,我想坚持使用基于 PHP 的解决方案,所以最终得到了这个:
Code:
代码:
function getNumPagesPdf($filepath){
$fp = @fopen(preg_replace("/\[(.*?)\]/i", "",$filepath),"r");
$max=0;
while(!feof($fp)) {
$line = fgets($fp,255);
if (preg_match('/\/Count [0-9]+/', $line, $matches)){
preg_match('/[0-9]+/',$matches[0], $matches2);
if ($max<$matches2[0]) $max=$matches2[0];
}
}
fclose($fp);
if($max==0){
$im = new imagick($filepath);
$max=$im->getNumberImages();
}
return $max;
}
If it can't figure things out because there are no Count tags, then it uses the imagick php extension. The reason I do a two-fold approach is because the latter is quite slow.
如果因为没有 Count 标签而无法解决问题,那么它会使用 imagick php 扩展。我采用双重方法的原因是后者非常慢。
回答by lothar42
回答by Baboum
Try this :
尝试这个 :
<?php
if (!$fp = @fopen($_REQUEST['file'],"r")) {
echo 'failed opening file '.$_REQUEST['file'];
}
else {
$max=0;
while(!feof($fp)) {
$line = fgets($fp,255);
if (preg_match('/\/Count [0-9]+/', $line, $matches)){
preg_match('/[0-9]+/',$matches[0], $matches2);
if ($max<$matches2[0]) $max=$matches2[0];
}
}
fclose($fp);
echo 'There '.($max<2?'is ':'are ').$max.' page'.($max<2?'':'s').' in '. $_REQUEST['file'].'.';
}
?>
The Count tag shows the number of pages in the different nodes. The parent node has the sum of the others in its Count tag, so this script just looks for the max (that is the number of pages).
Count 标签显示不同节点中的页数。父节点在其 Count 标签中有其他节点的总和,因此该脚本只查找最大值(即页数)。
回答by Baboum
this one does not use imagick:
这个不使用 imagick:
function getNumPagesInPDF($file)
{
//http://www.hotscripts.com/forums/php/23533-how-now-get-number-pages-one-document-pdf.html
if(!file_exists($file))return null;
if (!$fp = @fopen($file,"r"))return null;
$max=0;
while(!feof($fp)) {
$line = fgets($fp,255);
if (preg_match('/\/Count [0-9]+/', $line, $matches)){
preg_match('/[0-9]+/',$matches[0], $matches2);
if ($max<$matches2[0]) $max=$matches2[0];
}
}
fclose($fp);
return (int)$max;
}
回答by stev
function getNumPagesPdf($filepath) {
$fp = @fopen(preg_replace("/\[(.*?)\]/i", "", $filepath), "r");
$max = 0;
if (!$fp) {
return "Could not open file: $filepath";
} else {
while (!@feof($fp)) {
$line = @fgets($fp, 255);
if (preg_match('/\/Count [0-9]+/', $line, $matches)) {
preg_match('/[0-9]+/', $matches[0], $matches2);
if ($max < $matches2[0]) {
$max = trim($matches2[0]);
break;
}
}
}
@fclose($fp);
}
return $max;
}
This does exactly what i want:
这正是我想要的:
I just worked out this method of getting pdf page numbers... after getting the pdf page count i just add break to the while so that it does not go in infinite loop here....
我刚刚制定了这种获取 pdf 页码的方法……在获得 pdf 页数后,我只是在 while 中添加了中断,这样它就不会在这里无限循环……
回答by kenorb
On *nix environment you can use:
在 *nix 环境中,您可以使用:
exec('pdftops ' . $filename . ' - | grep showpage | wc -l', $output);
Where pdftops should be installed as default.
pdftops 应该默认安装在哪里。
Or as Xethron suggested:
或者像 Xethron 建议的那样:
pdfinfo filename.pdf | grep Pages: | awk '{print }'
回答by Richard de Wit
Using only PHP can result in installing complicated libraries, restarting Apache etc. and many pure PHP-ways (like opening streams and using regex) are inaccurate.
仅使用 PHP 会导致安装复杂的库、重新启动 Apache 等,并且许多纯 PHP 方式(如打开流和使用正则表达式)是不准确的。
The included answer is the only fast and reliable way I can think of. It uses a single executable though that doesn't have to be installed (either *nix or Windows) and a simple PHP script extracts the output. The best thing is that I haven't seen a wrong pagecount yet!
包含的答案是我能想到的唯一快速可靠的方法。它使用单个可执行文件,但不必安装(*nix 或 Windows),并且一个简单的 PHP 脚本提取输出。最好的事情是我还没有看到错误的页面计数!
It can be found here, including why the other approaches "don't work":
可以在这里找到,包括为什么其他方法“不起作用”:

