Linux 优化 PDF 文件(使用 Ghostscript 或其他)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10450120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimize PDF files (with Ghostscript or other)
提问by clarkk
Is Ghostscript the best option if you want to optimize a PDF file and reduce the file size?
如果您想优化 PDF 文件并减小文件大小,Ghostscript 是最佳选择吗?
I need to store alot of PDF files and therefore I need to optimize and reduce the file size as much as possible
我需要存储大量的 PDF 文件,因此我需要尽可能地优化和减小文件大小
Does anyone have any experience with Ghostscript and/or other?
有没有人有使用 Ghostscript 和/或其他的经验?
command line
命令行
exec('gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4
-dPDFSETTINGS=/screen -sOutputFile='.$file_new.' '.$file);
采纳答案by Kurt Pfeifle
If you looking for a Free (as in 'libre') Software, Ghostscript is surely your best choice. However, it is not always easy to use -- some of its (very powerful) processing options are not easy to find documented.
如果您正在寻找免费(如“libre”)软件,Ghostscript 无疑是您的最佳选择。然而,它并不总是易于使用——它的一些(非常强大的)处理选项不容易找到文档。
Have a look at this answer, which explains how to execute a more detailed control over image resolution downsampling than what the generic -dPDFSETTINGS=/screen
does (that defines a few overall defaults, which you may want to override):
看看这个答案,它解释了如何对图像分辨率下采样执行比泛型更详细的控制-dPDFSETTINGS=/screen
(定义了一些整体默认值,您可能想要覆盖):
Basically, it tells you how to make Ghostscript downsample all images to a resolution of 72dpi (this value is what -dPDFSETTINGS=/screen
uses -- you may want to go even lower):
基本上,它告诉您如何使 Ghostscript 将所有图像下采样到 72dpi 的分辨率(这个值就是-dPDFSETTINGS=/screen
使用的 - 您可能想要更低):
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=72 \
-dGrayImageResolution=72 \
-dMonoImageResolution=72 \
If you want to try if Ghostscript is able to also 'un-embed' the fonts used (sometimes it works, sometimes not -- depending on the complexity of the embedded font, and also on the font type used), you can try to add the following to your gs command:
如果您想尝试 Ghostscript 是否也能够“取消嵌入”所使用的字体(有时可以,有时不行——取决于嵌入字体的复杂性,以及使用的字体类型),您可以尝试将以下内容添加到您的 gs 命令中:
gs \
-o output.pdf \
[...other options...] \
-dEmbedAllFonts=false \
-dSubsetFonts=true \
-dConvertCMYKImagesToRGB=true \
-dCompressFonts=true \
-c ".setpdfwrite <</AlwaysEmbed [ ]>> setdistillerparams" \
-c ".setpdfwrite <</NeverEmbed [/Courier /Courier-Bold /Courier-Oblique /Courier-BoldOblique /Helvetica /Helvetica-Bold /Helvetica-Oblique /Helvetica-BoldOblique /Times-Roman /Times-Bold /Times-Italic /Times-BoldItalic /Symbol /ZapfDingbats /Arial]>> setdistillerparams" \
-f input.pdf
Note:Be aware that downsampling image resolution will surely reduce quality (irreversibly), and dis-embedding fonts will make it difficult or impossible to display and print the PDFs unless the same fonts are installed on the machine....
注意:请注意,缩小图像分辨率肯定会降低质量(不可逆转),并且除非机器上安装了相同的字体,否则脱嵌字体会使显示和打印 PDF 变得困难或不可能。...
Update
更新
One option which I had overlooked in my original answer is to add
我在原始答案中忽略的一个选项是添加
-dDetectDuplicateImages=true
to the command line. This parameter leads Ghostscript to try and detect any images which are embedded in the PDF multiple times. This can happen if you use an image as a logo or page background, and if the PDF-generating software is not optimized for this situation. This used to be the case with older versions of OpenOffice/LibreOffice (I tested the latest release of LibreOffice, v4.3.5.2, and it does no longer do such stupid things).
到命令行。此参数导致 Ghostscript 尝试检测多次嵌入 PDF 中的任何图像。如果您使用图像作为徽标或页面背景,并且 PDF 生成软件没有针对这种情况进行优化,就会发生这种情况。旧版本的 OpenOffice/LibreOffice 曾经是这种情况(我测试了最新版本的 LibreOffice,v4.3.5.2,它不再做这种愚蠢的事情)。
It also happens if you concatenate PDF files with the help of pdftk
. To show you the effect, and how you can discover it, let's look at a sample PDF file:
如果您在 .pdf 文件的帮助下连接 PDF 文件,也会发生这种情况pdftk
。为了向您展示效果以及如何发现它,让我们看一个示例 PDF 文件:
pdfinfo p1.pdf
Producer: libtiff / tiff2pdf - 20120922
CreationDate: Tue Jan 6 19:36:34 2015
ModDate: Tue Jan 6 19:36:34 2015
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 1
Encrypted: no
Page size: 595 x 842 pts (A4)
Page rot: 0
File size: 20983 bytes
Optimized: no
PDF version: 1.1
Recent versions of Poppler's pdfimages
utility have added support for a -list
parameter, which can list all images included in a PDF file:
Popplerpdfimages
实用程序的最新版本增加了对-list
参数的支持,该参数可以列出 PDF 文件中包含的所有图像:
pdfimages -list p1.pdf
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------
1 0 image 423 600 rgb 3 8 jpeg no 7 0 52 52 19.2K 2.6%
This sample PDF is a 1-page document, containing an image, which is compressed with JPEG-compression, has a width of 423 pixels and a height of 600 pixels and renders at a resolution of 52 PPI on the page.
此示例 PDF 是一个单页文档,包含一个图像,该图像使用 JPEG 压缩进行压缩,宽度为 423 像素,高度为 600 像素,并以 52 PPI 的分辨率在页面上呈现。
If we concatenate 3 copies of this file with the help of pdftk
like so:
如果我们在这样的帮助下连接这个文件的 3 个副本pdftk
:
pdftk p1.pdf p1.pdf p1.pdf cat output p3.pdf
then the result shows these image properties via pdfimages -list
:
然后结果通过pdfimages -list
以下方式显示这些图像属性:
pdfimages -list p3.pdf
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------
1 0 image 423 600 rgb 3 8 jpeg no 4 0 52 52 19.2K 2.6%
2 1 image 423 600 rgb 3 8 jpeg no 8 0 52 52 19.2K 2.6%
3 2 image 423 600 rgb 3 8 jpeg no 12 0 52 52 19.2K 2.6%
This shows that there are 3 identical PDF objects (with the IDs 4, 8 and 12) which are embedded in p3.pdf
now. p3.pdf
consists of 3 pages:
这表明p3.pdf
现在嵌入了 3 个相同的 PDF 对象(ID 为 4、8 和 12)。p3.pdf
由 3 页组成:
pdfinfo p3.pdf | grep Pages:
Pages: 3
Optimize PDF by replacing duplicate images with references
通过用参考替换重复图像来优化 PDF
Now we can apply the above mentioned optimization with the help of Ghostscript
现在我们可以在 Ghostscript 的帮助下应用上述优化
gs -o p3-optim.pdf -sDEVICE=pdfwrite -dDetectDuplicateImages=true p3.pdf
Checking:
检查:
pdfimages -list p3-optim.pdf
page num type width height color comp bpc enc interp objectID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------
1 0 image 423 600 rgb 3 8 jpeg no 10 0 52 52 19.2K 2.6%
2 1 image 423 600 rgb 3 8 jpeg no 10 0 52 52 19.2K 2.6%
3 2 image 423 600 rgb 3 8 jpeg no 10 0 52 52 19.2K 2.6%
There is still one image listed per page -- but the PDF object ID is always the same now: 10.
每页仍然列出一个图像——但 PDF 对象 ID 现在总是相同的:10。
ls -ltrh p1.pdf p3.pdf p3-optim.pdf
-rw-r--r--@ 1 kp staff 20K Jan 6 19:36 p1.pdf
-rw-r--r-- 1 kp staff 60K Jan 6 19:37 p3.pdf
-rw-r--r-- 1 kp staff 16K Jan 6 19:40 p3-optim.pdf
As you can see, the "dumb" concatentation made with pdftk increased the original file size to three times the original one. The optimization by Ghostscript brought it down by a considerable amount.
如您所见,使用 pdftk 进行的“愚蠢”连接将原始文件大小增加到原始文件大小的三倍。Ghostscript 的优化使它下降了相当多。
The most recent versions of Ghostscript may even apply the -dDetectDuplicateImages
by default. (AFAIR, v9.02, which introduced it for the first time, didn't use it by default.)
最新版本的 Ghostscript 甚至可能-dDetectDuplicateImages
默认应用。(AFAIR,v9.02,第一次引入,默认没有使用。)
回答by Martijn de Milliano
You can obtain good results by converting from PDF to Postscript, then back to PDF using
您可以通过从 PDF 转换为 Postscript,然后使用
pdf2ps file.pdf file.ps
ps2pdf -dPDFSETTINGS=/ebook file.ps file-optimized.pdf
The value of argument -dPDFSETTINGS
defines the quality of the images in the resulting PDF. Options are, from low to high quality: /screen
, /default
, /ebook
, /printer
, /prepress
, see http://milan.kupcevic.net/ghostscript-ps-pdf/for a reference.
参数的值-dPDFSETTINGS
定义了生成的 PDF 中图像的质量。选项从低质量到高质量:/screen
、/default
、/ebook
、/printer
、/prepress
,请参阅http://milan.kupcevic.net/ghostscript-ps-pdf/以获取参考。
The Postscript file can become quite large, but the results are worth it. I went from a 60 MB PDF to a 140 MB Postscript file, but ended up with a 1.1 MB optimized PDF.
Postscript 文件可能会变得非常大,但结果是值得的。我从 60 MB 的 PDF 变成了 140 MB 的 Postscript 文件,但最终得到了 1.1 MB 的优化 PDF。
回答by Anon
回答by Onlyjob
Ghostscript comes with two useful utilities: pdfopt
and ps2pdf14
. Both can be used to optimise PDF file(s) but on some occasions size of "optimised" file may be bigger than original.
Ghostscript 带有两个有用的实用程序:pdfopt
和ps2pdf14
. 两者都可用于优化 PDF 文件,但在某些情况下,“优化”文件的大小可能比原始文件大。
回答by Primoz Rome
回答by Skippy le Grand Gourou
You will lose in quality but if it's not an issue then ImageMagick's convert
may proves helpful?:
您将失去质量,但如果不是问题,那么 ImageMagickconvert
可能会有所帮助?:
convert original.pdf reduced.pdf
Note that it doesn't always work?: I once converted a 126?MB file into a 14?MB one using this command, but another time it doubled the size of a 350?Ko file.
请注意,它并不总是有效?:我曾经使用此命令将 126?MB 的文件转换为 14?MB 的文件,但又一次将 350?Ko 文件的大小增加了一倍。
Anyway it's worth giving it a try…
总之值得一试……
As mentioned in comments, of course there is no point in applying this command on a vector-based PDF, it will only be useful on rasterized images.
正如评论中提到的,当然在基于矢量的 PDF 上应用这个命令是没有意义的,它只对光栅化图像有用。
See also this postfor related options.
另请参阅此帖子以了解相关选项。
回答by Lukas Hillebrand
This worked for me
这对我有用
Convert your PDF to PS (this creates a large file
将您的 PDF 转换为 PS(这会创建一个大文件
pdf2ps large.pdf very_large.ps
Convert the new PS back to a PDF
将新 PS 转换回 PDF
ps2pdf very_large.ps small.pdf
Source: https://pandemoniumillusion.wordpress.com/2008/05/07/compress-a-pdf-with-pdftk/
资料来源:https: //pandemoniumillusion.wordpress.com/2008/05/07/compress-a-pdf-with-pdftk/