Java 将 PDF 文件转换为图像
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18189314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert a PDF file to image
提问by grep
I wanted to convert PDF document into image. I was using Ghost4j.
我想将 PDF 文档转换为图像。我正在使用 Ghost4j。
Problem:Ghost4J needs gsdll32.dll file at runtime, and I do notwant to use the dll file.
问题:Ghost4J需要gsdll32.dll文件在运行时,我也并不想使用的DLL文件。
Question 1:is there any way, in ghost4j to convert image without the dll?
问题1:有没有什么办法,在ghost4j中不用dll就可以转换图片?
Question 2:I found the solution in PDFBox API. org.apache.pdfbox.pdmodel.PDPagep
have method
convertToImage()` which converts PDF page to Image format.
问题 2:我在 PDFBox API 中找到了解决方案。org.apache.pdfbox.pdmodel.PDPagep
have method
convertToImage()` 将 PDF 页面转换为图像格式。
PDDocument doc = PDDocument.load(new File("/document.pdf"));
List<PDPage>pages = doc.getDocumentCatalog().getAllPages();
PDPage page = pages.get(0);
BufferedImage image =page.convertToImage();
File outputfile = new File("/image.png");
ImageIO.write(image, "png", outputfile);
doc.close();
I have only text on the PDF document. and I have that Exception when I run this code:
我只有 PDF 文档上的文本。运行此代码时出现异常:
Aug 12, 2013 6:00:24 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getawtFont(PDTrueTypeFont.java:481)
at org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:109)
at org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:235)
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:496)
at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781)
at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712)
at ge.eid.esignature.adessa.pades.sign.PDFtoImage.main(PDFtoImage.java:25)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.position(Buffer.java:216)
at sun.font.TrueTypeFont.lookupName(TrueTypeFont.java:1153)
at sun.font.TrueTypeFont.getPostscriptName(TrueTypeFont.java:1205)
at java.awt.Font.getPSName(Font.java:1156)
at org.apache.pdfbox.pdmodel.font.FontManager.loadFonts(FontManager.java:101)
at org.apache.pdfbox.pdmodel.font.FontManager.<clinit>(FontManager.java:53)
... 13 more
采纳答案by UdayKiran Pulipati
You can easily convert 04-Request-Headers.pdffile pages into image format.
您可以轻松地将04-Request-Headers.pdf文件页面转换为图像格式。
Convert all pdf pages into image format in Java using PDF Box.
使用 PDF Box 将所有 pdf 页面转换为 Java 中的图像格式。
Solution for Apache PDFBox 1.8.* version:
Apache PDFBox 1.8.* 版本的解决方案:
Jar required pdfbox-1.8.3.jar
jar 需要pdfbox-1.8.3.jar
or the maven dependency
或者 maven 依赖
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.8.3</version>
</dependency>
Here is the solution:
这是解决方案:
package com.pdf.pdfbox.examples;
import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
@SuppressWarnings("unchecked")
public class ConvertPDFPagesToImages {
public static void main(String[] args) {
try {
String sourceDir = "C:/Documents/04-Request-Headers.pdf"; // Pdf files are read from this folder
String destinationDir = "C:/Documents/Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here
File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
System.out.println("Images copied to Folder: "+ destinationFile.getName());
PDDocument document = PDDocument.load(sourceDir);
List<PDPage> list = document.getDocumentCatalog().getAllPages();
System.out.println("Total files to be converted -> "+ list.size());
String fileName = sourceFile.getName().replace(".pdf", "");
int pageNumber = 1;
for (PDPage page : list) {
BufferedImage image = page.convertToImage();
File outputfile = new File(destinationDir + fileName +"_"+ pageNumber +".png");
System.out.println("Image Created -> "+ outputfile.getName());
ImageIO.write(image, "png", outputfile);
pageNumber++;
}
document.close();
System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Possible conversions of image into jpg, jpeg, png, bmp, gif
format.
图像到jpg, jpeg, png, bmp, gif
格式的可能转换。
Note:I mentioned the mainly used image formats.
注意:我提到了主要使用的图像格式。
ImageIO.write(image , "jpg", new File( destinationDir +fileName+"_"+pageNumber+".jpg" ));
ImageIO.write(image , "jpeg", new File( destinationDir +fileName+"_"+pageNumber+".jpeg" ));
ImageIO.write(image , "png", new File( destinationDir +fileName+"_"+pageNumber+".png" ));
ImageIO.write(image , "bmp", new File( destinationDir +fileName+"_"+pageNumber+".bmp" ));
ImageIO.write(image , "gif", new File( destinationDir +fileName+"_"+pageNumber+".gif" ));
Console Output:
控制台输出:
Images copied to Folder: Converted_PdfFiles_to_Image
Total files to be converted -> 13
Aug 06, 2014 1:35:49 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_1.png
Aug 06, 2014 1:35:50 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_2.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_3.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_4.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_5.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_6.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_7.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_8.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_9.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_10.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_11.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_12.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_13.png
Converted Images are saved at -> C:\Documents\Converted_PdfFiles_to_Image
Solution for Apache PDFBox 2.0.* version:
Apache PDFBox 2.0.* 版本的解决方案:
Required Jars pdfbox-2.0.16.jar, fontbox-2.0.16.jar, commons-logging-1.2.jar
必需的罐子pdfbox-2.0.16.jar, fontbox-2.0.16.jar, commons-logging-1.2.jar
or from the pom.xml dependencies
或来自 pom.xml 依赖项
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-logging/commons-logging -->
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.2</version>
</dependency>
Solution for 2.0.16 version:
2.0.16版本解决方案:
package com.pdf.pdfbox.examples;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
/**
*
* @author venkataudaykiranp
*
* @version 2.0.16(Apache PDFBox version support)
*
*/
public class ConvertPDFPagesToImages {
public static void main(String[] args) {
try {
String sourceDir = "C:\Users\venkataudaykiranp\Downloads\04-Request-Headers.pdf"; // Pdf files are read from this folder
String destinationDir = "C:\Users\venkataudaykiranp\Downloads\Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here
File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
System.out.println("Images copied to Folder Location: "+ destinationFile.getAbsolutePath());
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
int numberOfPages = document.getNumberOfPages();
System.out.println("Total files to be converting -> "+ numberOfPages);
String fileName = sourceFile.getName().replace(".pdf", "");
String fileExtension= "png";
/*
* 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
* Ex: 1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
* 2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
*/
int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi
for (int i = 0; i < numberOfPages; ++i) {
File outPutFile = new File(destinationDir + fileName +"_"+ (i+1) +"."+ fileExtension);
BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
ImageIO.write(bImage, fileExtension, outPutFile);
}
document.close();
System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
回答by Xondio
The way over PDFBox is a good way to avoid native bindings. Try to use the PDFImageWriter from the PDFBox, i did the same with it in a few lines and it worked perfectly. You have to extract the PDFDocument and use the writer with it.
通过 PDFBox 的方式是避免本机绑定的好方法。尝试使用 PDFBox 中的 PDFImageWriter,我在几行中对它进行了相同的操作,并且效果很好。您必须提取 PDFDocument 并使用其编写器。
PDFImageWriter.write(doc, "png", null, , Integer.MAX_VALUE, "picture");
For all pages.
对于所有页面。
PDFImageWriter.write(doc, "png", null, 0, 0, "picture");
回答by Malhotra
Probably you have try to convert corrupted PDF file. I've the same errors when the PDF file contains JPXEncoded streams.
可能您已尝试转换损坏的 PDF 文件。当 PDF 文件包含 JPXEncoded 流时,我遇到了同样的错误。
回答by stanlyF
You can try to use NonSequentialParserto avoid errors with some PDF files (with incremental updates):
您可以尝试使用NonSequentialParser来避免某些 PDF 文件出错(带有增量更新):
PDDocument doc = PDDocument.loadNonSeq(new File("/document.pdf"));
PDDocument doc = PDDocument.loadNonSeq(new File("/document.pdf"));
回答by Vahap Gencdal
try {
PDDocument document = PDDocument.load(PdfInfo.getPDFWAY());
if (document.isEncrypted()) {
document.decrypt(PdfInfo.getPASSWORD());
}
if ("bilevel".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE( BufferedImage.TYPE_BYTE_BINARY);
} else if ("indexed".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_INDEXED);
} else if ("gray".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_GRAY);
} else if ("rgb".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_RGB);
} else if ("rgba".equalsIgnoreCase(PdfInfo.getCOLOR())) {
PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_ARGB);
} else {
System.exit(2);
}
PDFImageWriter imageWriter = new PDFImageWriter();
boolean success = imageWriter.writeImage(document, PdfInfo.getIMAGE_FORMAT(),PdfInfo.getPASSWORD(),
PdfInfo.getSTART_PAGE(),PdfInfo.getEND_PAGE(),PdfInfo.getOUTPUT_PREFIX(),PdfInfo.getIMAGETYPE(),PdfInfo.getRESOLUTION());
if (!success) {
System.exit(1);
}
document.close();
} catch (IOException | CryptographyException | InvalidPasswordException ex) {
Logger.getLogger(PdfToImae.class.getName()).log(Level.SEVERE, null, ex);
}
public class PdfInfo {
private static String PDFWAY;
private static String OUTPUT_PREFIX;
private static String PASSWORD;
private static int START_PAGE=1;
private static int END_PAGE=Integer.MAX_VALUE;
private static String IMAGE_FORMAT="jpg";
private static String COLOR="rgb";
private static int RESOLUTION=256;
private static int IMAGETYPE=24;
private static String filename;
private static String filePath="";
}
回答by user2605874
For the error:
对于错误:
org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation
org.apache.pdfbox.util.PDFStreamEngine processOperator INFO:不支持/禁用的操作
You need to include fontbox-1.7.1 jar in the class path apart from Apache pdfbox jar which will fix your issue as PDFBox internally uses fontbox-1.7.1
除了 Apache pdfbox jar 之外,您需要在类路径中包含 fontbox-1.7.1 jar,这将解决您的问题,因为 PDFBox 内部使用 fontbox-1.7.1
回答by Bittu Choudhary
You can easily convert PDF into image using PDFBox. renderImageWithDPImethod of PDFRendererclass of PDFBoxis used to convert pdf to image.
您可以使用PDFBox轻松地将 PDF 转换为图像。PDFBox的PDFRenderer类的renderImageWithDPI方法用于将 pdf 转换为图像。
PDDocument doc=PDDocument.load(new File("filepath/sample.pdf"));
PDFRenderer pdfRenderer = new PDFRenderer(doc);
BufferedImage bffim = pdfRenderer.renderImageWithDPI(pageNo, 300, ImageType.RGB);
String fileName = "image-" + page + ".png";
ImageIOUtil.writeImage(bim, fileName, 300);