Java 将 PDF 文件转换为图像

Question

提问by grep

I wanted to convert PDF document into image. I was using Ghost4j.

我想将 PDF 文档转换为图像。我正在使用 Ghost4j。

Problem:Ghost4J needs gsdll32.dll file at runtime, and I do notwant to use the dll file.

问题：Ghost4J需要gsdll32.dll文件在运行时，我也并不想使用的DLL文件。

Question 1:is there any way, in ghost4j to convert image without the dll?

问题1：有没有什么办法，在ghost4j中不用dll就可以转换图片？

Question 2:I found the solution in PDFBox API. org.apache.pdfbox.pdmodel.PDPagep have methodconvertToImage()` which converts PDF page to Image format.

问题 2：我在 PDFBox API 中找到了解决方案。org.apache.pdfbox.pdmodel.PDPagep have methodconvertToImage()` 将 PDF 页面转换为图像格式。

PDDocument doc = PDDocument.load(new File("/document.pdf"));
List<PDPage>pages =  doc.getDocumentCatalog().getAllPages();
PDPage page = pages.get(0);
BufferedImage image =page.convertToImage();
File outputfile = new File("/image.png");
ImageIO.write(image, "png", outputfile);
doc.close();

I have only text on the PDF document. and I have that Exception when I run this code:

我只有 PDF 文档上的文本。运行此代码时出现异常：

Aug 12, 2013 6:00:24 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getawtFont(PDTrueTypeFont.java:481)
    at org.apache.pdfbox.pdmodel.font.PDSimpleFont.drawString(PDSimpleFont.java:109)
    at org.apache.pdfbox.pdfviewer.PageDrawer.processTextPosition(PageDrawer.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:496)
    at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
    at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:125)
    at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:781)
    at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:712)
    at ge.eid.esignature.adessa.pades.sign.PDFtoImage.main(PDFtoImage.java:25)
Caused by: java.lang.IllegalArgumentException
    at java.nio.Buffer.position(Buffer.java:216)
    at sun.font.TrueTypeFont.lookupName(TrueTypeFont.java:1153)
    at sun.font.TrueTypeFont.getPostscriptName(TrueTypeFont.java:1205)
    at java.awt.Font.getPSName(Font.java:1156)
    at org.apache.pdfbox.pdmodel.font.FontManager.loadFonts(FontManager.java:101)
    at org.apache.pdfbox.pdmodel.font.FontManager.<clinit>(FontManager.java:53)
    ... 13 more

Answer 1

采纳答案by UdayKiran Pulipati

You can easily convert 04-Request-Headers.pdffile pages into image format.

您可以轻松地将04-Request-Headers.pdf文件页面转换为图像格式。

Convert all pdf pages into image format in Java using PDF Box.

使用 PDF Box 将所有 pdf 页面转换为 Java 中的图像格式。

Solution for Apache PDFBox 1.8.* version:

Apache PDFBox 1.8.* 版本的解决方案：

Jar required pdfbox-1.8.3.jar

jar 需要pdfbox-1.8.3.jar

or the maven dependency

或者 maven 依赖

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>1.8.3</version>
</dependency>

Here is the solution:

这是解决方案：

package com.pdf.pdfbox.examples;

import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

@SuppressWarnings("unchecked")
public class ConvertPDFPagesToImages {
    public static void main(String[] args) {
        try {
        String sourceDir = "C:/Documents/04-Request-Headers.pdf"; // Pdf files are read from this folder
        String destinationDir = "C:/Documents/Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here

        File sourceFile = new File(sourceDir);
        File destinationFile = new File(destinationDir);
        if (!destinationFile.exists()) {
            destinationFile.mkdir();
            System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
        }
        if (sourceFile.exists()) {
            System.out.println("Images copied to Folder: "+ destinationFile.getName());             
            PDDocument document = PDDocument.load(sourceDir);
            List<PDPage> list = document.getDocumentCatalog().getAllPages();
            System.out.println("Total files to be converted -> "+ list.size());

            String fileName = sourceFile.getName().replace(".pdf", "");             
            int pageNumber = 1;
            for (PDPage page : list) {
                BufferedImage image = page.convertToImage();
                File outputfile = new File(destinationDir + fileName +"_"+ pageNumber +".png");
                System.out.println("Image Created -> "+ outputfile.getName());
                ImageIO.write(image, "png", outputfile);
                pageNumber++;
            }
            document.close();
            System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
        } else {
            System.err.println(sourceFile.getName() +" File not exists");
        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

Possible conversions of image into jpg, jpeg, png, bmp, gifformat.

图像到jpg, jpeg, png, bmp, gif格式的可能转换。

Note:I mentioned the mainly used image formats.

注意：我提到了主要使用的图像格式。

ImageIO.write(image , "jpg", new File( destinationDir +fileName+"_"+pageNumber+".jpg" ));
ImageIO.write(image , "jpeg", new File( destinationDir +fileName+"_"+pageNumber+".jpeg" ));
ImageIO.write(image , "png", new File( destinationDir +fileName+"_"+pageNumber+".png" ));
ImageIO.write(image , "bmp", new File( destinationDir +fileName+"_"+pageNumber+".bmp" ));
ImageIO.write(image , "gif", new File( destinationDir +fileName+"_"+pageNumber+".gif" ));

Console Output:

控制台输出：

Images copied to Folder: Converted_PdfFiles_to_Image
Total files to be converted -> 13
Aug 06, 2014 1:35:49 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_1.png
Aug 06, 2014 1:35:50 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_2.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_3.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_4.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_5.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_6.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_7.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_8.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_9.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_10.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_11.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_12.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_13.png
Converted Images are saved at -> C:\Documents\Converted_PdfFiles_to_Image

Solution for Apache PDFBox 2.0.* version:

Apache PDFBox 2.0.* 版本的解决方案：

Required Jars pdfbox-2.0.16.jar, fontbox-2.0.16.jar, commons-logging-1.2.jar

必需的罐子pdfbox-2.0.16.jar, fontbox-2.0.16.jar, commons-logging-1.2.jar

or from the pom.xml dependencies

或来自 pom.xml 依赖项

<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>fontbox</artifactId>
    <version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-logging/commons-logging -->
<dependency>
    <groupId>commons-logging</groupId>
    <artifactId>commons-logging</artifactId>
    <version>1.2</version>
</dependency>

Solution for 2.0.16 version:

2.0.16版本解决方案：

package com.pdf.pdfbox.examples;

import java.awt.image.BufferedImage;
import java.io.File;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;

/**
 * 
 * @author venkataudaykiranp
 * 
 * @version 2.0.16(Apache PDFBox version support)
 *
 */
public class ConvertPDFPagesToImages {
    public static void main(String[] args) {
        try {
            String sourceDir = "C:\Users\venkataudaykiranp\Downloads\04-Request-Headers.pdf"; // Pdf files are read from this folder
            String destinationDir = "C:\Users\venkataudaykiranp\Downloads\Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here

            File sourceFile = new File(sourceDir);
            File destinationFile = new File(destinationDir);
            if (!destinationFile.exists()) {
                destinationFile.mkdir();
                System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
            }
            if (sourceFile.exists()) {
                System.out.println("Images copied to Folder Location: "+ destinationFile.getAbsolutePath());             
                PDDocument document = PDDocument.load(sourceFile);
                PDFRenderer pdfRenderer = new PDFRenderer(document);

                int numberOfPages = document.getNumberOfPages();
                System.out.println("Total files to be converting -> "+ numberOfPages);

                String fileName = sourceFile.getName().replace(".pdf", "");             
                String fileExtension= "png";
                /*
                 * 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
                 * Ex:  1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
                 *      2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
                 */
                int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi 

                for (int i = 0; i < numberOfPages; ++i) {
                    File outPutFile = new File(destinationDir + fileName +"_"+ (i+1) +"."+ fileExtension);
                    BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
                    ImageIO.write(bImage, fileExtension, outPutFile);
                }

                document.close();
                System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
            } else {
                System.err.println(sourceFile.getName() +" File not exists");
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Answer 2

回答by Xondio

The way over PDFBox is a good way to avoid native bindings. Try to use the PDFImageWriter from the PDFBox, i did the same with it in a few lines and it worked perfectly. You have to extract the PDFDocument and use the writer with it.

通过 PDFBox 的方式是避免本机绑定的好方法。尝试使用 PDFBox 中的 PDFImageWriter，我在几行中对它进行了相同的操作，并且效果很好。您必须提取 PDFDocument 并使用其编写器。

PDFImageWriter.write(doc, "png", null, , Integer.MAX_VALUE, "picture");

For all pages.

对于所有页面。

PDFImageWriter.write(doc, "png", null, 0, 0, "picture");

See: PDFImageWriter Javadoc

请参阅： PDFImageWriter Javadoc

Answer 3

回答by Malhotra

Probably you have try to convert corrupted PDF file. I've the same errors when the PDF file contains JPXEncoded streams.

可能您已尝试转换损坏的 PDF 文件。当 PDF 文件包含 JPXEncoded 流时，我遇到了同样的错误。

Answer 4

回答by stanlyF

You can try to use NonSequentialParserto avoid errors with some PDF files (with incremental updates):

您可以尝试使用NonSequentialParser来避免某些 PDF 文件出错（带有增量更新）：

PDDocument doc = PDDocument.loadNonSeq(new File("/document.pdf"));

Answer 5

回答by Vahap Gencdal

 try {           
                PDDocument document = PDDocument.load(PdfInfo.getPDFWAY());
                if (document.isEncrypted()) {
                    document.decrypt(PdfInfo.getPASSWORD());
                }
                if ("bilevel".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE( BufferedImage.TYPE_BYTE_BINARY);
                } else if ("indexed".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_INDEXED);
                } else if ("gray".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_GRAY);
                } else if ("rgb".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_RGB);
                } else if ("rgba".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_ARGB);
                } else {
                    System.exit(2);
                }
                PDFImageWriter imageWriter = new PDFImageWriter();
                boolean success = imageWriter.writeImage(document, PdfInfo.getIMAGE_FORMAT(),PdfInfo.getPASSWORD(),
                        PdfInfo.getSTART_PAGE(),PdfInfo.getEND_PAGE(),PdfInfo.getOUTPUT_PREFIX(),PdfInfo.getIMAGETYPE(),PdfInfo.getRESOLUTION());
                if (!success) {
                    System.exit(1);
                }
                document.close();

        } catch (IOException | CryptographyException | InvalidPasswordException ex) {
            Logger.getLogger(PdfToImae.class.getName()).log(Level.SEVERE, null, ex);
        }
public class PdfInfo {
    private static String PDFWAY;    
    private static String OUTPUT_PREFIX;
    private static String PASSWORD;
    private static int START_PAGE=1;
    private static int END_PAGE=Integer.MAX_VALUE;
    private static String IMAGE_FORMAT="jpg";
    private static String COLOR="rgb";
    private static int RESOLUTION=256;
    private static int IMAGETYPE=24;
    private static String filename;
    private static String filePath="";
}

Answer 6

回答by user2605874

For the error:

对于错误：

org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation

org.apache.pdfbox.util.PDFStreamEngine processOperator INFO：不支持/禁用的操作

You need to include fontbox-1.7.1 jar in the class path apart from Apache pdfbox jar which will fix your issue as PDFBox internally uses fontbox-1.7.1

除了 Apache pdfbox jar 之外，您需要在类路径中包含 fontbox-1.7.1 jar，这将解决您的问题，因为 PDFBox 内部使用 fontbox-1.7.1

Answer 7

回答by Bittu Choudhary

You can easily convert PDF into image using PDFBox. renderImageWithDPImethod of PDFRendererclass of PDFBoxis used to convert pdf to image.

您可以使用PDFBox轻松地将 PDF 转换为图像。PDFBox的PDFRenderer类的renderImageWithDPI方法用于将 pdf 转换为图像。

PDDocument doc=PDDocument.load(new File("filepath/sample.pdf"));
PDFRenderer pdfRenderer = new PDFRenderer(doc);
BufferedImage bffim = pdfRenderer.renderImageWithDPI(pageNo, 300, ImageType.RGB);
        String fileName = "image-" + page + ".png";
        ImageIOUtil.writeImage(bim, fileName, 300);

Java 将 PDF 文件转换为图像

提问by grep

采纳答案by UdayKiran Pulipati

回答by Xondio

回答by Malhotra

回答by stanlyF

回答by Vahap Gencdal

回答by user2605874

回答by Bittu Choudhary

相关推荐

最近更新

标签

Java 将 PDF 文件转换为图像

提问by grep

采纳答案by UdayKiran Pulipati

回答by Xondio

回答by Malhotra

回答by stanlyF

回答by Vahap Gencdal

回答by user2605874

回答by Bittu Choudhary

相关推荐

Java 如何编写自定义异常？

目标不是 JDK 根。未找到系统库。Eclipse Oxygen 4.7 + Java9 错误

Java 标记“else”上的语法错误，删除它

跟踪 Java 中的内存泄漏/垃圾收集问题

相关推荐

最近更新

标签