Java 如何使用嵌入图像减小 RTF 的大小?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1405054/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 11:43:15  来源:igfitidea点击:

How to reduce size of RTF with embedded images?

javaperformanceimagertf

提问by A_M

We have some code which produces an RTF document from a RTF template. It is basically doing string search and replaces of special tags within the RTF file. This is accessible via a web page.

我们有一些代码可以从 RTF 模板生成 RTF 文档。它基本上是在 RTF 文件中进行字符串搜索和替换特殊标签。这可以通过网页访问。

Typically, the processing time for this is really quick.

通常,处理时间非常快。

However, we need to embed an image within a template. We've been embedding these as JPEG images using Word's "Insert/Picture/From File..." functionality. But we've found that the resultant RTF file size is massively dependant upon the image.

但是,我们需要在模板中嵌入图像。我们已经使用 Word 的“插入/图片/来自文件...”功能将这些嵌入为 JPEG 图像。但是我们发现生成的 RTF 文件大小在很大程度上取决于图像。

For example, I've inserted a 20k JPEG logo (which is basically a solid background with some text). The RTF file increased in size from around 390k (without the image) to 510k (with the image).

例如,我插入了一个 20k JPEG 徽标(基本上是带有一些文本的纯色背景)。RTF 文件的大小从大约 390k(没有图像)增加到 510k(有图像)。

Then we inserted a JPEG containing a screenshot, i.e. the image contains text, multiple colours, etc. The JPEG is around 150k. Using this image, the RTF file increased in size from 390k to 3.5MB.

然后我们插入一个包含截图的JPEG,即图像包含文本、多种颜色等。JPEG 大约为150k。使用此图像,RTF 文件的大小从 390k 增加到 3.5MB。

So the encoding that Word uses for storing images into an RTF doesn't perform linearly. I'm guessing it is dependant upon what is in the JPEG image.

因此,Word 用于将图像存储到 RTF 中的编码不是线性执行的。我猜这取决于 JPEG 图像中的内容。

I need to keep the size of the RTF templates to a minimum to try and keep our file processing times to a minimum.

我需要将 RTF 模板的大小保持在最低限度,以尽量减少我们的文件处理时间。

  • Does anyone have any ideas on how to minimize the size of the RTF files with embedded images?
  • Is there any way of controlling the encoding that Word uses? I can't see any options anywhere.
  • Does anyone know what type of binary encoding Word/RTF uses?
  • 有没有人对如何最小化带有嵌入图像的 RTF 文件的大小有任何想法?
  • 有没有办法控制 Word 使用的编码?我在任何地方都看不到任何选项。
  • 有谁知道 Word/RTF 使用什么类型的二进制编码?

Thanks in advance.

提前致谢。

采纳答案by DaveParillo

An image in an RTF file gets stored as a WMF, uncompressed. On mac, it it would be macpict. Your best bet to keep the file size down is to link the image to the document rather than insert a copy in the document. The trade-off is that you have to keep the files together.

RTF 文件中的图像被存储为 WMF,未压缩。在 mac 上,它将是 macpict。减小文件大小的最佳方法是将图像链接到文档,而不是在文档中插入副本。权衡是您必须将文件保存在一起。

EDITIs compressing the RTF an option? Using zip/rar, you'll get your file size back, but you'll have to uncompress, first obviously. There are supposed to be tools that will do rtf compression, but I have never used them.

编辑压缩 RTF 是一种选择吗?使用 zip/rar,您将恢复文件大小,但您必须首先解压缩。应该有可以进行rtf压缩的工具,但我从未使用过它们。

回答by swartbees

Here is the best solution

这是最好的解决方案

http://support.microsoft.com/kb/224663

http://support.microsoft.com/kb/224663

Excerpt:

摘抄:

SYMPTOMS

When you save a Microsoft Word document that contains an EMF, PNG, GIF, or JPEG graphic as a different file format (for example, Word 6.0/95 (.doc) or Rich Text Format (.rtf)), the file size of the document may dramatically increase.

For example, a Microsoft Word 2000 document that contains a JPEG graphic that is saved as a Word 2000 document may have a file size of 45,568 bytes (44.5KB). However, when you save this file as Word 6.0/95 (.doc) or as Rich Text Format (.rtf), the file size may grow to 1,289,728 bytes (1.22MB).

CAUSE

This functionality is by design in Microsoft Word. If an EMF, a PNG, a GIF, or a JPEG graphic is inserted into a Word document, when the document is saved, two copies of the graphic are saved in the document. Graphics are saved in the applicable EMF, PNG, GIF, or JPEG format and are also converted to WMF (Windows Metafile) format.

RESOLUTION

Warning If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.

To prevent Word from saving two copies of the graphic in the document, and to reduce the file size of the document, add the ExportPictureWithMetafile=0 string value to the Microsoft Windows registry.

症状

当您将包含 EMF、PNG、GIF 或 JPEG 图形的 Microsoft Word 文档保存为不同的文件格式(例如,Word 6.0/95 ( .doc) 或 RTF 格式 (.rtf))时,文件大小为该文件可能会急剧增加。

例如,包含保存为 Word 2000 文档的 JPEG 图形的 Microsoft Word 2000 文档的文件大小可能为 45,568 字节 (44.5KB)。但是,当您将此文件保存为 Word 6.0/95 ( .doc) 或 RTF 格式 (.rtf) 时,文件大小可能会增长到 1,289,728 字节 (1.22MB)。

原因

此功能是在 Microsoft Word 中设计的。如果将 EMF、PNG、GIF 或 JPEG 图形插入到 Word 文档中,则在保存文档时,文档中会保存该图形的两个副本。图形以适用的 EMF、PNG、GIF 或 JPEG 格式保存,并且还会转换为 WMF(Windows 图元文件)格式。

解析度

警告如果注册表编辑器使用不当,可能会导致严重的问题,可能需要重新安装操作系统。Microsoft 不能保证您可以解决因注册表编辑器使用不当而导致的问题。使用注册表编辑器风险自负。

若要防止 Word 在文档中保存图形的两个副本,并减小文档的文件大小,请将 ExportPictureWithMetafile=0 字符串值添加到 Microsoft Windows 注册表中。

回答by Brian

Yes, by removing the redundant characters. And to do this you must insert them back into your stream. For instance if you have over twenty f characters in one line, then you can replace with f[20] in your stream. It is a start.

是的,通过删除多余的字符。为此,您必须将它们重新插入您的流中。例如,如果一行中有超过 20 个 f 字符,那么您可以在流中替换为 f[20]。这是一个开始。

-Best of luck.

- 祝你好运。

回答by Ricardo Appleton

We have done a similar project over at work. Only we're not using that "Insert/Picture/From File..." functionality. Our template has a tag named [photos], as I presume your own does also. When we process the document we replace the tag with the RTF codes needed to display images. We're putting them within a table and we're displaying two images on each row, plus a row on top for the title.

我们在工作中做了一个类似的项目。只是我们没有使用“插入/图片/从文件...”功能。我们的模板有一个名为 [photos] 的标签,我想你自己的也是如此。当我们处理文档时,我们用显示图像所需的 RTF 代码替换标签。我们将它们放在一个表格中,每行显示两个图像,加上标题的顶部一行。

So, you might place a tag [photos] in your template. Then you replace the tag with the RTF Codes. You can find some good references to these codes on the web. For eg. here.

因此,您可以在模板中放置一个标签 [photos]。然后用 RTF 代码替换标签。您可以在网上找到对这些代码的一些很好的参考。例如。在这里

Now, my code looks something like this:

现在,我的代码看起来像这样:

\par {\rtf1\ansi\deff0{\trowd\cellx8810 {title}\intbl\qc\cell\row}{\trowd\cellx4405\cellx8810{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex Your image as an array of bytes in hexadecimal}\intbl\cell{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex Your other image}\intbl\cell\row}

\par {\rtf1\ansi\deff0{\trowd\cellx8810 { title}\intbl\qc\cell\row}{\trowd\cellx4405\cellx8810{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-550\piccro \piccropt-50\piccropb-50\hex 您的图像作为十六进制字节数组}\intbl\cell{\pict\jpegblip\picwgoal4000\picpgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\ hex 你的另一张图片}\intbl\cell\row}

if you get your image into a byte array, you may use BitConverter.ToString(array) to get your hex code. only you'll need to replace dashes "-" by "";

如果您将图像放入字节数组,则可以使用 BitConverter.ToString(array) 来获取十六进制代码。只有你需要用“”替换破折号“-”;

Our files will take up less than 1/10th of the space a "normal" RTF will. If we open the doc's code with an editor such as Notepad++, we can see the RTF codes, but if we open the document and save it as RTF (changing its name), it'll go from 1.5Mb to 50Mb!! I'm guessing DaveParillo's reply justifies it: I'm only writing each image once.

我们的文件将占用不到“正常”RTF 空间的 1/10。如果我们用 Notepad++ 等编辑器打开文档的代码,我们可以看到 RTF 代码,但是如果我们打开文档并将其另存为 RTF(更改其名称),它将从 1.5Mb 变为 50Mb!我猜 DaveParillo 的回复证明了这一点:我只写每个图像一次。

Hope it helps. Cheers mate

希望能帮助到你。队友的欢呼声

回答by Anthony

The Swartbees answer worked perfectly for me. I first reduced the image quality to "0" using G.I.M.P. Save as jpeg functionality. After following the microsoft solution suggested by Swartbees above I reinserted the picture into the file and the size increase was negligible 229k to 279k (as opposed to 29000kb).

Swartbees 的回答对我来说非常有效。我首先使用 GIMP Save as jpeg 功能将图像质量降低到“0”。按照上面 Swartbees 建议的微软解决方案之后,我将图片重新插入文件中,大小增加可以忽略不计,从 229k 到 279k(而不是 29000kb)。

Thanks for your suggestions guys.

谢谢你们的建议。

回答by joseluisbz

Initially, keep in mind that each byte is stored using 2 characters (two bytes), this means that the increments at least is the double size of original picture.

最初,请记住每个字节使用 2 个字符(两个字节)存储,这意味着增量至少是原始图片的两倍大小。

Other things that you need is that Word and Word Pad insert different (flavor or format) of the same image plus other fields (that RTF can to be displayed without them).

您需要的其他东西是 Word 和 Word Pad 插入不同(风格或格式)的相同图像以及其他字段(RTF 可以在没有它们的情况下显示)。

Here are some scripts used to insert images in RTF (https://joseluisbz.wordpress.com/2011/06/22/script-de-clases-rtf-para-jsp-y-php/), and one example of use (https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con-php-y-jsp-y-mostrarlos-en-rtf-usando-clases/)

以下是一些用于在 RTF 中插入图像的脚本(https://joseluisbz.wordpress.com/2011/06/22/script-de-clases-rtf-para-jsp-y-php/),以及一个使用示例( https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con-php-y-jsp-y-mostrarlos-en-rtf -usando-classes/)

Now, maybe you will need replace the original Image with another (http://joseluisbz.wordpress.com/2013/07/26/exploring-a-wmf-file-0x000900/).

现在,也许您需要用另一个(http://joseluisbz.wordpress.com/2013/07/26/exploring-a-wmf-file-0x000900/)替换原始图像。