Html 将 PDF 的第一页显示为图像

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11828528/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 02:06:44  来源:igfitidea点击:

Display first page of PDF as Image

htmlimagepdfiframejsf-2

提问by Fahim Parkar

I am creating web application where I am displaying images/ pdf in thumbnail format. Onclicking respective image/ pdf it get open in new window.

我正在创建 Web 应用程序,在其中以缩略图格式显示图像/pdf。单击相应的图像/ pdf 它会在新窗口中打开。

For PDF, I have (this is code of the new window)

对于PDF,我有(这是新窗口的代码)

<iframe src="images/testes.pdf" width="800" height="200" />

Using this I can see all PDF in web browser. However for thumbnail purpose, I want to display only first page of PDF as an Image.

使用它,我可以在网络浏览器中查看所有 PDF。但是出于缩略图目的,我只想将 PDF 的第一页显示为图像。

I tried

我试过

 <h:graphicImage value="images/testes.pdf" width="800" height="200" />

however it is not working. Any idea how to get this done?

但是它不起作用。知道如何完成这项工作吗?

Update 1

更新 1

I am providing path of pdf file for example purpose. However I have images in Database. In actual I have code as below.

我正在提供 pdf 文件的路径作为示例。但是我在数据库中有图像。实际上我有如下代码。

<iframe src="#{PersonalInformationDataBean.myAttachmentString}" width="800" height="200" />

Update 2

更新 2

For sake of thumbnail, what I am using is

为了缩略图,我使用的是

 <h:graphicImage height=200 width=200 value="...."> 

however I need to achieve same for PDF also.

但是我也需要为 PDF 实现相同的目标。

Hope I am clear what I am expecting...

希望我清楚我的期望......

采纳答案by Fahim Parkar

This is what I used

这是我用的

Document document = new Document();
try {
    document.setFile(myProjectPath);
    System.out.println("Parsed successfully...");
} catch (PDFException ex) {
    System.out.println("Error parsing PDF document " + ex);
} catch (PDFSecurityException ex) {
    System.out.println("Error encryption not supported " + ex);
} catch (FileNotFoundException ex) {
    System.out.println("Error file not found " + ex);
} catch (IOException ex) {
    System.out.println("Error handling PDF document " + ex);
}



// save page caputres to file.
float scale = 1.0f;
float rotation = 0f;

System.out.println("scale == " + scale);

// Paint each pages content to an image and write the image to file
InputStream fis2 = null;
File file = null;
for (int i = 0; i < 1; i++) {
    BufferedImage image = (BufferedImage) document.getPageImage(i,
    GraphicsRenderingHints.SCREEN,
    Page.BOUNDARY_CROPBOX, rotation, scale);
    RenderedImage rendImage = image;
    // capture the page image to file
    try {
        System.out.println("\t capturing page " + i);
        file = new File(myProjectActualPath + "myImage.png");
        ImageIO.write(rendImage, "png", file);
        fis2 = new BufferedInputStream(new FileInputStream(myProjectActualPath + "myImage.png"));

    } catch (IOException ioe) {
        System.out.println("IOException :: " + ioe);
    } catch (Exception e) {
        System.out.println("Exception :: " + e);
    }
    image.flush();
}

回答by Kurt Pfeifle

I'm not sure if all browsers display your embedded PDF (done via <h:graphicImage value="some.pdf" ... /> ) equally well.

我不确定所有浏览器是否都能<h:graphicImage value="some.pdf" ... /> 同样好地显示您嵌入的 PDF(通过 完成)。

Extracting 1st Page as PDF

将第一页提取为 PDF

If you insist on using PDF, I'd recommend one of these 2 commandline tools to extract the first page of any PDF:

如果您坚持使用 PDF,我建议您使用以下 2 个命令行工具中的一个来提取任何 PDF 的第一页:

  1. pdftk
  2. Ghostscript
  1. pdftk
  2. 幽灵脚本

Both are available for Linux, Mac OS X and Windows.

两者都适用于 Linux、Mac OS X 和 Windows。

pdftk command

pdftk 命令

pdftk input.pdf cat 1 output page-1-of-input.pdf

Ghostscript command

Ghostscript 命令

gs -o page-1-of-input.pdf -sDEVICE=pdfwrite -dPDFLastPage=1 input.pdf

(On Windows use gswin32c.exeor gswin64c.exeinstead of gs.)

(在 Windows 上使用gswin32c.exegswin64c.exe代替gs。)

pdftkis slightly faster than Ghostscript when it comes to page extraction, but for a single page that difference is probably neglectable.As of the most recent released version, v9.05, the previous sentence is no longer true. I found that Ghostscript (including all startup overhead) requires ~1 second to extract the 1st page from the 756 page PDF specification, while PDFTK needed ~11 seconds.

pdftk在页面提取方面比 Ghostscript 略快,但对于单个页面,这种差异可能可以忽略不计。从最新发布的版本 v9.05 开始,前面的句子不再正确。我发现 Ghostscript(包括所有启动开销)需要约 1 秒才能从 756 页 PDF 规范中提取第一页,而 PDFTK 需要约 11 秒。

Converting 1st Page to JPEG

将第一页转换为 JPEG

If you want to be sure that even older browsers can display your 1st page well, then convert it to JPEG. Ghostscript is your friend here (ImageMagick cannot do it by itself, it needs the help of Ghostscript anyway):

如果您想确保即使是较旧的浏览器也能很好地显示您的第一页,请将其转换为 JPEG。Ghostscript 是你的朋友(ImageMagick 不能自己做,它无论如何都需要 Ghostscript 的帮助):

gs -o page-1-of-input-PDF.jpeg -sDEVICE=jpeg -dLastPage=1 input.pdf

Should you need page 33, you can do it like this:

如果你需要第 33 页,你可以这样做:

gs -o page-33-of-input-PDF.jpeg -sDEVICE=jpeg -dFirstPage=33 -dLastPage33 input.pdf

If you need a range of PDFs, like pages 17-23, try this:

如果您需要一系列 PDF,例如第 17-23 页,请尝试以下操作:

gs -o page-16+%03d-of-input-PDF.jpeg -sDEVICE=jpeg -dFirstPage=17 -dLastPage23 input.pdf

Note, that the %03dnotation increments with each page processed, starting with 1. So your first JPEG's name would be page-16+001-of-input-PDF.jpeg.

请注意,%03d符号随着处理的每个页面而增加,从 1 开始。因此您的第一个 JPEG 名称将是page-16+001-of-input-PDF.jpeg.

Maybe PNG is better?

也许PNG更好?

Be aware that JPEG isn't a format suited well for images containing high black+white contrast and sharp edges like text pages. PNG is much better for this.

请注意,JPEG 不是一种非常适合包含高黑白对比度和像文本页面那样锐利边缘的图像的格式。PNG 对此要好得多。

To create a PNG from the 1st PDF pages with Ghostscript is easy:

使用 Ghostscript 从第一个 PDF 页面创建 PNG 很容易:

gs -o page-1-of-input-PDF.png -sDEVICE=pngalpha -dLastPage=1 input.pdf

The analog options as with JPEGs are true when it comes to extract ranges of pages.

在提取页面范围时,与 JPEG 一样的模拟选项是正确的。

回答by Kurt Pfeifle

Warning:Don't use Ma9ic's script (posted in another answer) unless you want to...

警告:除非您想……否则不要使用 Ma9ic 的脚本(在另一个答案中发布)。

  • ...make the PDF->JPEG conversion consume much more time + resources than it should be
  • ...give up your own control over the PDF->JPEG conversion process altogether.
  • ...使 PDF->JPEG 转换消耗的时间和资源比应有的多得多
  • ...完全放弃对 PDF->JPEG 转换过程的控制。

While it may work well for you there are so many problems in these 8 little lines of Bash.

虽然它可能对您很有效,但在这 8 条 Bash 小行中存在很多问题。

First,
it uses identifyto extract the number of pages from the input PDF. However, identify(part of ImageMagick) is completely unable to process PDFs all by itself. It has to run Ghostscript as a 'delegate'to handle PDF input. It would be much more efficient to use Ghostscript directly instead of running it indirectly, via ImageMagick.

首先,
它用于identify从输入 PDF 中提取页数。但是,identify(ImageMagick 的一部分)完全无法单独处理 PDF。它必须运行 Ghostscript 作为“委托”来处理 PDF 输入。直接使用 Ghostscript 而不是通过 ImageMagick 间接运行它会更有效率。

Second,
it uses convertto PDF->JPEG conversion. Same remark as above: it uses Ghostscript anyway, so why not run it directly?

其次,
它用于convertPDF->JPEG 转换。与上面相同的评论:它无论如何都使用Ghostscript,那么为什么不直接运行它呢?

Third,
it loops over the pages and runs a different convertprocess for every single page of the PDF, that is 100 converts for a 100 page PDF file. That means: it also runs 100 Ghostscript commands to produce 100 JPEGs.

第三,
它遍历页面并convert为 PDF 的每一页运行不同的过程,即 100 页 PDF 文件的 100 次转换。这意味着:它还运行 100 个 Ghostscript 命令来生成 100 个 JPEG。

Fourth,
Fahim Parkar's question was to get a thumbnail from the firstpage of the PDF, not from all of them.

第四
Fahim Parkar 的问题是从PDF的第一页中获取缩略图,而不是从所有页面中获取缩略图。

The script does run at least 201 different commands for a 100 page PDF, when it could all be done in just 1 command. If you Ghostscript directly...

该脚本确实为 100 页的 PDF 运行了至少 201 个不同的命令,而这一切都可以通过 1 个命令完成。如果你直接使用 Ghostscript...

  1. ...not only will it run faster and more efficiently,
  2. ...but also it will give you more fine-grained and better control over the JPEGs' quality settings.
  1. ...它不仅会运行得更快更高效,
  2. ...但它也会让您对 JPEG 的质量设置进行更细粒度和更好的控制。

Use the right tool for the job, and use it correctly!

为工作使用正确的工具,并正确使用它!



Update:

更新:

Since I was asked, here is my alternative implementation to Ma9ic's script.

由于我被问到,这是我对 Ma9ic 脚本的替代实现。

#!/bin/bash 
infile=

gs -q -o $(basename "${infile}")_p%04d.jpeg -sDEVICE=jpeg "${infile}"

# To get thumbnail JPEGs with a width 200 pixel use the following command:
# gs -q -o name_200px_p%04d.jpg -sDEVICE=jpeg -dPDFFitPage -g200x400 "${infile}"

# To get higher quality JPEGs (but also bigger-in-size ones) with a 
# resolution of 300 dpi use the following command:
# gs -q -o name_300dpi_p%04d.jpg -sDEVICE=jpeg -dJPEGQ=100 -r300 "${infile}"

echo "Done"

I've even run a benchmark on it. I converted the 756-page PDF-1.7 specification to JPEGs with both scripts:

我什至对它进行了基准测试。我使用两个脚本将 756 页的 PDF-1.7 规范转换为 JPEG:

  • Ma9ic's version needs 1413 seconds generate the 756 JPEGs.
  • My version saves 93% of that time and takes 91 seconds.
  • Moreover, Ma9ic's script produces on my system mostly black JPEG images, mine are Ok.
  • Ma9ic 的版本需要 1413 秒生成 756 个 JPEG。
  • 我的版本节省了 93% 的时间,只需要 91 秒。
  • 此外,Ma9ic 的脚本在我的系统上生成的大部分是黑色 JPEG 图像,我的还可以。