Java 如何从 iText PDFReader 获取字节数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21608598/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get byte array from iText PDFReader
提问by Subbu
How to get byte array from Itext PDFReader.
如何从 Itext PDFReader 获取字节数组。
float width = 8.5f * 72;
float height = 11f * 72;
float tolerance = 1f;
PdfReader reader = new PdfReader("source.pdf");
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
Rectangle cropBox = reader.getCropBox(i);
float widthToAdd = width - cropBox.getWidth();
float heightToAdd = height - cropBox.getHeight();
if (Math.abs(widthToAdd) > tolerance || Math.abs(heightToAdd) > tolerance)
{
float[] newBoxValues = new float[] {
cropBox.getLeft() - widthToAdd / 2,
cropBox.getBottom() - heightToAdd / 2,
cropBox.getRight() + widthToAdd / 2,
cropBox.getTop() + heightToAdd / 2
};
PdfArray newBox = new PdfArray(newBoxValues);
PdfDictionary pageDict = reader.getPageN(i);
pageDict.put(PdfName.CROPBOX, newBox);
pageDict.put(PdfName.MEDIABOX, newBox);
}
}
From above code I need to get byte array from reader object. How?
从上面的代码我需要从读取器对象获取字节数组。如何?
1) Not working, getting empty byteArray.
1) 不工作,得到空的 byteArray。
OutputStream out = new ByteArrayOutputStream();
PdfStamper stamper = new PdfStamper(reader, out);
stamper.close();
byte byteArray[] = (((ByteArrayOutputStream)out).toByteArray());
2) Not working, getting java.io.IOException: Error: Header doesn't contain versioninfo
2) 不工作,得到 java.io.IOException: Error: Header 不包含 versioninfo
ByteArrayOutputStream outputStream = new ByteArrayOutputStream( );
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
outputStream.write(reader.getPageContent(i));
}
PDDocument pdDocument = new PDDocument().load(outputStream.toByteArray( );)
Is there any other way to get byte array from PDFReader.
有没有其他方法可以从 PDFReader 获取字节数组。
回答by AmitG
回答by Bruno Lowagie
Let's take a the question from a different angle. It seems to me that you want to render a PDF page by page. If so, then your question is all wrong. Extracting the page content stream will not be sufficient as I already indicated: not a single renderer will be able to render such a stream because you don't pass any resources such as fonts, Form and Image XObjects,...
让我们换个角度来回答这个问题。在我看来,您想逐页呈现 PDF。如果是这样,那么你的问题都是错误的。正如我已经指出的那样,提取页面内容流是不够的:没有一个渲染器能够渲染这样的流,因为您没有传递任何资源,例如字体、表单和图像 XObjects,...
If you want to render separate pages from a PDF, you need to burstthe document into separate single page full-blown PDF documents. These single page documents need to contain all the necessary information to render the page. This isn't memory friendly: suppose that you have a 100 KByte document of 10 pages where every page shows an 80 KByte logo, you'll end up with 10 documents that are each at least 80 KByte (times 10 makes already 800 KByte which is much more than the 10-page document where a single Image XObject is shared by the 10 pages).
如果你想呈现从PDF单独的页面,你需要爆发的文档转换成单独的单页全面的PDF文档。这些单页文档需要包含呈现页面所需的所有信息。这不是内存友好的:假设您有一个 100 KB 的 10 页文档,其中每页显示一个 80 KB 徽标,您最终会得到 10 个文档,每个文档至少为 80 KB(乘以 10 已经是 800 KB远远超过 10 页文档,其中单个 Image XObject 由 10 页共享)。
You'd need to do something like this:
你需要做这样的事情:
PdfReader reader = new PdfReader("source.pdf");
int n = reader.getNumberOfPages();
reader close();
ByteArrayOutputStream boas;
PdfStamper stamper;
for (int i = 0; i < n; ) {
reader = new PdfReader("source.pdf");
reader.selectPages(String.valueOf(++i));
baos = new ByteArrayOutputStream();
stamper = new PdfStamper(reader, baos);
stamper.close();
doSomethingWithBytes(baos.toByteArray);
}
In this case, baos.toByteArray()
will contain the bytes of a valid PDF file. This wasn't the case in any of your attempts.
在这种情况下,baos.toByteArray()
将包含有效 PDF 文件的字节。在您的任何尝试中都不是这种情况。