Linux 优化 PDF 文件(使用 Ghostscript 或其他)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10450120/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 06:09:51  来源:igfitidea点击:

Optimize PDF files (with Ghostscript or other)

linuxpdfdebianghostscript

提问by clarkk

Is Ghostscript the best option if you want to optimize a PDF file and reduce the file size?

如果您想优化 PDF 文件并减小文件大小,Ghostscript 是最佳选择吗?

I need to store alot of PDF files and therefore I need to optimize and reduce the file size as much as possible

我需要存储大量的 PDF 文件,因此我需要尽可能地优化和减小文件大小

Does anyone have any experience with Ghostscript and/or other?

有没有人有使用 Ghostscript 和/或其他的经验?

command line

命令行

exec('gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4
-dPDFSETTINGS=/screen -sOutputFile='.$file_new.' '.$file);

采纳答案by Kurt Pfeifle

If you looking for a Free (as in 'libre') Software, Ghostscript is surely your best choice. However, it is not always easy to use -- some of its (very powerful) processing options are not easy to find documented.

如果您正在寻找免费(如“libre”)软件,Ghostscript 无疑是您的最佳选择。然而,它并不总是易于使用——它的一些(非常强大的)处理选项不容易找到文档。

Have a look at this answer, which explains how to execute a more detailed control over image resolution downsampling than what the generic -dPDFSETTINGS=/screendoes (that defines a few overall defaults, which you may want to override):

看看这个答案,它解释了如何对图像分辨率下采样执行比泛型更详细的控制-dPDFSETTINGS=/screen(定义了一些整体默认值,您可能想要覆盖):

Basically, it tells you how to make Ghostscript downsample all images to a resolution of 72dpi (this value is what -dPDFSETTINGS=/screenuses -- you may want to go even lower):

基本上,它告诉您如何使 Ghostscript 将所有图像下采样到 72dpi 的分辨率(这个值就是-dPDFSETTINGS=/screen使用的 - 您可能想要更低):

-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=72 \
-dGrayImageResolution=72 \
-dMonoImageResolution=72 \

If you want to try if Ghostscript is able to also 'un-embed' the fonts used (sometimes it works, sometimes not -- depending on the complexity of the embedded font, and also on the font type used), you can try to add the following to your gs command:

如果您想尝试 Ghostscript 是否也能够“取消嵌入”所使用的字体(有时可以,有时不行——取决于嵌入字体的复杂性,以及使用的字体类型),您可以尝试将以下内容添加到您的 gs 命令中:

gs \
  -o output.pdf \
   [...other options...] \
  -dEmbedAllFonts=false \
  -dSubsetFonts=true \
  -dConvertCMYKImagesToRGB=true \
  -dCompressFonts=true \
  -c ".setpdfwrite <</AlwaysEmbed [ ]>> setdistillerparams" \
  -c ".setpdfwrite <</NeverEmbed [/Courier /Courier-Bold /Courier-Oblique /Courier-BoldOblique /Helvetica /Helvetica-Bold /Helvetica-Oblique /Helvetica-BoldOblique /Times-Roman /Times-Bold /Times-Italic /Times-BoldItalic /Symbol /ZapfDingbats /Arial]>> setdistillerparams" \
  -f input.pdf

Note:Be aware that downsampling image resolution will surely reduce quality (irreversibly), and dis-embedding fonts will make it difficult or impossible to display and print the PDFs unless the same fonts are installed on the machine....

注意:请注意,缩小图像分辨率肯定会降低质量(不可逆转),并且除非机器上安装了相同的字体,否则脱嵌字体会使显示和打印 PDF 变得困难或不可能。...



Update

更新

One option which I had overlooked in my original answer is to add

我在原始答案中忽略的一个选项是添加

-dDetectDuplicateImages=true

to the command line. This parameter leads Ghostscript to try and detect any images which are embedded in the PDF multiple times. This can happen if you use an image as a logo or page background, and if the PDF-generating software is not optimized for this situation. This used to be the case with older versions of OpenOffice/LibreOffice (I tested the latest release of LibreOffice, v4.3.5.2, and it does no longer do such stupid things).

到命令行。此参数导致 Ghostscript 尝试检测多次嵌入 PDF 中的任何图像。如果您使用图像作为徽标或页面背景,并且 PDF 生成软件没有针对这种情况进行优化,就会发生这种情况。旧版本的 OpenOffice/LibreOffice 曾经是这种情况(我测试了最新版本的 LibreOffice,v4.3.5.2,它不再做这种愚蠢的事情)。

It also happens if you concatenate PDF files with the help of pdftk. To show you the effect, and how you can discover it, let's look at a sample PDF file:

如果您在 .pdf 文件的帮助下连接 PDF 文件,也会发生这种情况pdftk。为了向您展示效果以及如何发现它,让我们看一个示例 PDF 文件:

pdfinfo p1.pdf

 Producer:       libtiff / tiff2pdf - 20120922
 CreationDate:   Tue Jan  6 19:36:34 2015
 ModDate:        Tue Jan  6 19:36:34 2015
 Tagged:         no
 UserProperties: no
 Suspects:       no
 Form:           none
 JavaScript:     no
 Pages:          1
 Encrypted:      no
 Page size:      595 x 842 pts (A4)
 Page rot:       0
 File size:      20983 bytes
 Optimized:      no
 PDF version:    1.1

Recent versions of Poppler's pdfimagesutility have added support for a -listparameter, which can list all images included in a PDF file:

Popplerpdfimages实用程序的最新版本增加了对-list参数的支持,该参数可以列出 PDF 文件中包含的所有图像:

pdfimages -list p1.pdf

 page num  type width height color comp bpc  enc interp objectID x-ppi y-ppi size ratio
 --------------------------------------------------------------------------------------
    1   0 image    423   600   rgb    3   8 jpeg     no     7  0    52    52 19.2K 2.6%

This sample PDF is a 1-page document, containing an image, which is compressed with JPEG-compression, has a width of 423 pixels and a height of 600 pixels and renders at a resolution of 52 PPI on the page.

此示例 PDF 是一个单页文档,包含一个图像,该图像使用 JPEG 压缩进行压缩,宽度为 423 像素,高度为 600 像素,并以 52 PPI 的分辨率在页面上呈现。

If we concatenate 3 copies of this file with the help of pdftklike so:

如果我们在这样的帮助下连接这个文件的 3 个副本pdftk

pdftk p1.pdf p1.pdf p1.pdf cat output p3.pdf

then the result shows these image properties via pdfimages -list:

然后结果通过pdfimages -list以下方式显示这些图像属性:

pdfimages -list p3.pdf

 page num  type width height color comp bpc  enc interp objectID x-ppi y-ppi size ratio
 --------------------------------------------------------------------------------------
    1   0 image   423    600   rgb    3   8 jpeg     no     4  0    52    52 19.2K 2.6%
    2   1 image   423    600   rgb    3   8 jpeg     no     8  0    52    52 19.2K 2.6%
    3   2 image   423    600   rgb    3   8 jpeg     no    12  0    52    52 19.2K 2.6%

This shows that there are 3 identical PDF objects (with the IDs 4, 8 and 12) which are embedded in p3.pdfnow. p3.pdfconsists of 3 pages:

这表明p3.pdf现在嵌入了 3 个相同的 PDF 对象(ID 为 4、8 和 12)。p3.pdf由 3 页组成:

pdfinfo p3.pdf | grep Pages:

 Pages:          3

Optimize PDF by replacing duplicate images with references

通过用参考替换重复图像来优化 PDF

Now we can apply the above mentioned optimization with the help of Ghostscript

现在我们可以在 Ghostscript 的帮助下应用上述优化

 gs -o p3-optim.pdf -sDEVICE=pdfwrite -dDetectDuplicateImages=true p3.pdf

Checking:

检查:

 pdfimages -list p3-optim.pdf

 page num  type width height color comp bpc  enc interp objectID x-ppi y-ppi size ratio
 --------------------------------------------------------------------------------------
    1   0 image   423    600   rgb    3   8 jpeg     no    10  0    52    52 19.2K 2.6%
    2   1 image   423    600   rgb    3   8 jpeg     no    10  0    52    52 19.2K 2.6%
    3   2 image   423    600   rgb    3   8 jpeg     no    10  0    52    52 19.2K 2.6%

There is still one image listed per page -- but the PDF object ID is always the same now: 10.

每页仍然列出一个图像——但 PDF 对象 ID 现在总是相同的:10。

 ls -ltrh p1.pdf p3.pdf p3-optim.pdf

   -rw-r--r--@ 1 kp  staff    20K Jan  6 19:36 p1.pdf
   -rw-r--r--  1 kp  staff    60K Jan  6 19:37 p3.pdf
   -rw-r--r--  1 kp  staff    16K Jan  6 19:40 p3-optim.pdf

As you can see, the "dumb" concatentation made with pdftk increased the original file size to three times the original one. The optimization by Ghostscript brought it down by a considerable amount.

如您所见,使用 pdftk 进行的“愚蠢”连接将原始文件大小增加到原始文件大小的三倍。Ghostscript 的优化使它下降了相当多。

The most recent versions of Ghostscript may even apply the -dDetectDuplicateImagesby default. (AFAIR, v9.02, which introduced it for the first time, didn't use it by default.)

最新版本的 Ghostscript 甚至可能-dDetectDuplicateImages默认应用。(AFAIR,v9.02,第一次引入,默认没有使用。)

回答by Martijn de Milliano

You can obtain good results by converting from PDF to Postscript, then back to PDF using

您可以通过从 PDF 转换为 Postscript,然后使用

pdf2ps file.pdf file.ps
ps2pdf -dPDFSETTINGS=/ebook file.ps file-optimized.pdf

The value of argument -dPDFSETTINGSdefines the quality of the images in the resulting PDF. Options are, from low to high quality: /screen, /default, /ebook, /printer, /prepress, see http://milan.kupcevic.net/ghostscript-ps-pdf/for a reference.

参数的值-dPDFSETTINGS定义了生成的 PDF 中图像的质量。选项从低质量到高质量:/screen/default/ebook/printer/prepress,请参阅http://milan.kupcevic.net/ghostscript-ps-pdf/以获取参考。

The Postscript file can become quite large, but the results are worth it. I went from a 60 MB PDF to a 140 MB Postscript file, but ended up with a 1.1 MB optimized PDF.

Postscript 文件可能会变得非常大,但结果是值得的。我从 60 MB 的 PDF 变成了 140 MB 的 Postscript 文件,但最终得到了 1.1 MB 的优化 PDF。

回答by Anon

You may find that pdftocairo(from Poppler) can make smaller PDFs but beware that it will strip some features (such as hyperlinks) away.

您可能会发现pdftocairo(来自Poppler)可以制作较小的 PDF,但要注意它会去除某些功能(例如超链接)。

回答by Onlyjob

Ghostscript comes with two useful utilities: pdfoptand ps2pdf14. Both can be used to optimise PDF file(s) but on some occasions size of "optimised" file may be bigger than original.

Ghostscript 带有两个有用的实用程序:pdfoptps2pdf14. 两者都可用于优化 PDF 文件,但在某些情况下,“优化”文件的大小可能比原始文件大。

回答by Primoz Rome

I use Ghostscript with following options taken from here.

我使用 Ghostscript 和从这里获取的以下选项。

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen \
 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

回答by Skippy le Grand Gourou

You will lose in quality but if it's not an issue then ImageMagick's convertmay proves helpful?:

您将失去质量,但如果不是问题,那么 ImageMagickconvert可能会有所帮助?:

convert original.pdf reduced.pdf

Note that it doesn't always work?: I once converted a 126?MB file into a 14?MB one using this command, but another time it doubled the size of a 350?Ko file.

请注意,它并不总是有效?:我曾经使用此命令将 126?MB 的文件转换为 14?MB 的文件,但又一次将 350?Ko 文件的大小增加了一倍。

Anyway it's worth giving it a try…

总之值得一试……

As mentioned in comments, of course there is no point in applying this command on a vector-based PDF, it will only be useful on rasterized images.

正如评论中提到的,当然在基于矢量的 PDF 上应用这个命令是没有意义的,它只对光栅化图像有用。

See also this postfor related options.

另请参阅此帖子以了解相关选项。

回答by Lukas Hillebrand

This worked for me

这对我有用

Convert your PDF to PS (this creates a large file

将您的 PDF 转换为 PS(这会创建一个大文件

pdf2ps large.pdf very_large.ps

Convert the new PS back to a PDF

将新 PS 转换回 PDF

ps2pdf very_large.ps small.pdf

Source: https://pandemoniumillusion.wordpress.com/2008/05/07/compress-a-pdf-with-pdftk/

资料来源:https: //pandemoniumillusion.wordpress.com/2008/05/07/compress-a-pdf-with-pdftk/