使用免费软件库使用 C# 编程压缩现有 PDF
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13719553/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compress existing PDF using C# programming using freeware libraries
提问by Luv
I have been searching a lot on Google about how to compress existing pdf(size).
My problem is
我在谷歌上搜索了很多关于如何压缩现有pdf(大小)的信息。我的问题是
I can't use any application, because it needs to be done by a C# program.
I can't use any paid library as my clients don't want to go out of Budget. So a PAID library is certainly a NO
我不能使用任何应用程序,因为它需要由 C# 程序完成。
我不能使用任何付费图书馆,因为我的客户不想超出预算。所以付费图书馆肯定是不行的
I did my home-work for last 2 days and came upon a solution using iTextSharp, BitMiracle but to no avail as the former decrease just 1% of a file and later one is a paid.
我做了最后 2 天的家庭作业,并找到了使用 iTextSharp、BitMiracle 的解决方案,但无济于事,因为前者仅减少了文件的 1%,而后者是付费的。
I also came across PDFcompressNET and pdftk but i wasn't able to find their .dll.
我还遇到了 PDFcompressNET 和 pdftk,但我找不到它们的 .dll。
Actually the pdf is insurance policy with 2-3 images (black and white) and around 70 pages accounting to size of 5 MB.
实际上,pdf 是包含 2-3 张图像(黑白)和大约 70 页的保险单,大小为 5 MB。
I need the output in pdf only(can't be in any other format)
我只需要 pdf 格式的输出(不能是任何其他格式)
采纳答案by plinth
Here's an approach to do this (and this should work without regard to the toolkit you use):
这是执行此操作的一种方法(这应该可以在不考虑您使用的工具包的情况下工作):
If you have a 24-bit rgb or 32 bit cmyk image do the following:
如果您有 24 位 rgb 或 32 位 cmyk 图像,请执行以下操作:
- determine if the image is really what it is. If it's cmyk, convert to rgb. If it's rgb and really gray, convert to gray. If it's gray or paletted and only has 2 real colors, convert to 1-bit. If it's gray and there is relatively little in the way of gray variations, consider converting to 1 bit with a suitable binarization technique.
- measure the image dimensions in relation to how it is being placed on the page - if it's 300 dpi or greater, consider resampling the image to a smaller size depending on the bit depth of the image - for example, you can probablygo from 300 dpi gray or rgb to 200 dpi and not lose too much detail.
- if you have an rgb image that is really color, consider palettizing it.
- Examine the contents of the image to see if you can help make it more compressible. For example, if you run through a color/gray image and fine a lot of colors that cluster, consider smoothing them. If it's gray or black and white and contains a number of specks, consider despeckling.
- choose your final compression wisely. JPEG2000 can do better than JPEG. JBIG2 does much better than G4. Flate is probably the best non-destructive compression for gray. Most implementations of JPEG2000 and JBIG2 are notfree.
- if you're a rock star, you want to try to segment the image and break it into areas that are really black and white and really color.
- 确定图像是否真的是它的样子。如果是cmyk,则转换为rgb。如果它是 rgb 并且真的是灰色的,则转换为灰色。如果它是灰色或调色板并且只有 2 种真实颜色,则转换为 1 位。如果它是灰色的,并且灰度变化的方式相对较少,请考虑使用合适的二值化技术转换为 1 位。
- 测量与其在页面上的放置方式相关的图像尺寸 - 如果它是 300 dpi 或更高,请考虑根据图像的位深度将图像重新采样为较小的尺寸 - 例如,您可能可以从 300 dpi灰色或 rgb 到 200 dpi 并且不会丢失太多细节。
- 如果您有一个真正彩色的 rgb 图像,请考虑将其调色。
- 检查图像的内容,看看是否可以帮助使其更易于压缩。例如,如果您浏览了一张彩色/灰色图像,并对聚集的许多颜色进行了精细处理,请考虑对它们进行平滑处理。如果它是灰色或黑白相间的并且包含许多斑点,请考虑去斑。
- 明智地选择您的最终压缩。JPEG2000 可以比 JPEG 做得更好。JBIG2 比 G4 好得多。Flate 可能是最好的灰度无损压缩。JPEG2000 和 JBIG2 的大多数实现都不是免费的。
- 如果您是摇滚明星,您想尝试分割图像并将其分成真正的黑白和彩色区域。
That said, if you do can do all of this well in an unsupervised manner, you have a commercial product in its own right.
也就是说,如果你能在无人监督的情况下做好这一切,你就拥有了一个商业产品。
I will say that you can do most of this with Atalasoft dotImage(disclaimers: it's not free; I work there; I've written nearly all the PDF tools; I used to work on Acrobat).
我会说您可以使用Atalasoft dotImage完成大部分工作(免责声明:它不是免费的;我在那里工作;我编写了几乎所有的 PDF 工具;我曾经在 Acrobat 上工作)。
One particular way to that with dotImage is to pull out all the pages that are image only, recompress them and save them out to a new PDF then build a new PDF by taking all the pages from the original document and replacing them the recompressed pages, then saving again. It's not that hard.
使用 dotImage 的一种特殊方法是拉出所有仅是图像的页面,重新压缩它们并将它们保存到一个新的 PDF 中,然后通过从原始文档中取出所有页面并将它们替换为重新压缩的页面来构建一个新的 PDF,然后再次保存。这并不难。
List<int> pagesToReplace = new List<int>();
PdfImageCollection pagesToEncode = new PdfImageCollection();
using (Document doc = new Document(sourceStream, password)) {
for (int i=0; i < doc.Pages.Count; i++) {
Page page = doc.Pages[i];
if (page.SingleImageOnly) {
pagesToReplace.Add(i);
// a PDF image encapsulates an image an compression parameters
PdfImage image = ProcessImage(sourceStream, doc, page, i);
pagesToEncode.Add(i);
}
}
PdfEncoder encoder = new PdfEncoder();
encoder.Save(tempOutStream, pagesToEncode, null); // re-encoded pages
tempOutStream.Seek(0, SeekOrigin.Begin);
sourceStream.Seek(0, SeekOrigin.Begin);
PdfDocument finalDoc = new PdfDocument(sourceStream, password);
PdfDocument replacementPages = new PdfDocument(tempOutStream);
for (int i=0; i < pagesToReplace.Count; i++) {
finalDoc.Pages[pagesToReplace[i]] = replacementPages.Pages[i];
}
finalDoc.Save(finalOutputStream);
What's missing here is ProcessImage(). ProcessImage will rasterize the page (and you wouldn't need to understand that the image might have been scaled to be on the PDF) or extract the image (and track the transformation matrix on the image), and go through the steps listed above. This is non-trivial, but it's doable.
这里缺少的是 ProcessImage()。ProcessImage 将光栅化页面(您不需要了解图像可能已缩放到 PDF 上)或提取图像(并跟踪图像上的变换矩阵),并执行上面列出的步骤。这是不平凡的,但它是可行的。
回答by Bobrovsky
I think you might want to make your clients aware that any of the libraries you mentioned is not completely free:
我认为您可能想让您的客户意识到您提到的任何库都不是完全免费的:
- iTextSharp is AGPL-licensed, so you mustrelease source code of your solution or buy a commercial license.
- PDFcompressNET is a commercial library.
- pdftk is GPL-licensed, so you mustrelease source code of your solution or buy a commercial license.
- Docotic.Pdf is a commercial library.
- iTextSharp 是 AGPL 许可的,因此您必须发布解决方案的源代码或购买商业许可证。
- PDFcompressNET 是一个商业库。
- pdftk 是 GPL 许可的,因此您必须发布解决方案的源代码或购买商业许可证。
- Dotic.Pdf 是一个商业图书馆。
Given all of the above I assume I can drop freewarerequirement.
鉴于以上所有我认为我可以放弃免费软件的要求。
Docotic.Pdf can reduce size of compressed and uncompressed PDFsto different degrees without introducing any destructive changes.
Docotic.Pdf 可以在不引入任何破坏性更改的情况下,不同程度地减小压缩和未压缩 PDF 的大小。
Gains depend on the size and structure of a PDF: For small files or files that are mostly scanned images the reduction might not be that great, so you should try the library with your files and see for yourself.
收益取决于 PDF 的大小和结构:对于小文件或主要是扫描图像的文件,减少可能不会那么大,因此您应该尝试使用文件库并亲自查看。
If you are most concerned about size andthere are many images in your files andyou are fine with loosing some of the quality of those images then you can easily recompress existing images using Docotic.Pdf.
如果您最关心大小,并且您的文件中有很多图像,并且您可以降低这些图像的某些质量,那么您可以使用 Docotic.Pdf 轻松地重新压缩现有图像。
Here is the code that makes all images bilevel and compressed with fax compression:
这是使所有图像双水平并使用传真压缩进行压缩的代码:
static void RecompressExistingImages(string fileName, string outputName)
{
using (PdfDocument doc = new PdfDocument(fileName))
{
foreach (PdfImage image in doc.Images)
image.RecompressWithGroup4Fax();
doc.Save(outputName);
}
}
There are also RecompressWithFlate, RecompressWithGroup3Faxand RecompressWithJpegmethods.
还有RecompressWithFlate,RecompressWithGroup3Fax和RecompressWithJpeg方法。
The library will convert color images to bilevel ones if needed. You can specify deflate compression level, JPEG quality etc.
如果需要,库会将彩色图像转换为双层图像。您可以指定 deflate 压缩级别、JPEG 质量等。
Docotic.Pdf can also resize big images (and recompress them at the same time) in PDF. This might be useful if images in a document are actually bigger then needed or if quality of images is not that important.
Docotic.Pdf 还可以在 PDF 中调整大图像的大小(并同时重新压缩它们)。如果文档中的图像实际上比需要的大,或者图像质量不是那么重要,这可能很有用。
Below is a code that scales all images that have width or height greater or equal to 256. Scaled images are then encoded using JPEG compression.
下面是缩放宽度或高度大于或等于 256 的所有图像的代码。然后使用 JPEG 压缩对缩放后的图像进行编码。
public static void RecompressToJpeg(string path, string outputPath)
{
using (PdfDocument doc = new PdfDocument(path))
{
foreach (PdfImage image in doc.Images)
{
// image that is used as mask or image with attached mask are
// not good candidates for recompression
if (!image.IsMask && image.Mask == null && (image.Width >= 256 || image.Height >= 256))
image.Scale(0.5, PdfImageCompression.Jpeg, 65);
}
doc.Save(outputPath);
}
}
Images can be resized to specified width and height using one of the ResizeTomethods. Please note that ResizeTomethod won't try to preserve aspect ratio of images. You should calculate proper width and height yourself.
可以使用其中一种ResizeTo方法将图像调整为指定的宽度和高度。请注意,该ResizeTo方法不会尝试保留图像的纵横比。您应该自己计算适当的宽度和高度。
Disclaimer: I work for Bit Miracle.
免责声明:我为 Bit Miracle 工作。
回答by brismuth
GhostScriptis AGPL licensed software that can compress PDFs. There is also an AGPL licensed C# wrapper for it on github here.
GhostScript是 AGPL 许可的软件,可以压缩 PDF。在 github here上还有一个 AGPL 许可的 C# 包装器。
You could use the GhostscriptProcessorclass from that wrapper to pass custom commands to GhostScript, like the ones found in this AskUbuntu answerdescribing PDF compression.
您可以使用该GhostscriptProcessor包装器中的类将自定义命令传递给 GhostScript,就像在描述 PDF 压缩的AskUbuntu 答案中找到的一样。
回答by Simon
Using PdfSharp
使用PdfSharp
public static void CompressPdf(string targetPath)
{
using (var stream = new MemoryStream(File.ReadAllBytes(targetPath)) {Position = 0})
using (var source = PdfReader.Open(stream, PdfDocumentOpenMode.Import))
using (var document = new PdfDocument())
{
var options = document.Options;
options.FlateEncodeMode = PdfFlateEncodeMode.BestCompression;
options.UseFlateDecoderForJpegImages = PdfUseFlateDecoderForJpegImages.Automatic;
options.CompressContentStreams = true;
options.NoCompression = false;
foreach (var page in source.Pages)
{
document.AddPage(page);
}
document.Save(targetPath);
}
}

