Javascript 在浏览器中编辑*现有* PDF

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44073718/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 02:19:39  来源:igfitidea点击:

Edit *existing* PDF in a browser

javascriptpdfhtml5-canvaspdf.js

提问by neilsimp1

I have a web application that is currently getting a base64 representation of a PDF from the server. I'm able to use Mozilla's pdf.js to display this on a <canvas>and toggle through the pages with a dropdown.

我有一个 Web 应用程序,它当前正在从服务器获取 PDF 的 base64 表示。我可以使用 Mozilla 的 pdf.js 在 a 上显示它<canvas>并使用下拉列表切换页面。

According to everything I've been able to find and Can Mozilla's pdf.js modify PDFs?, it's not possible to edit the PDF with pdf.js.

根据我能找到的所有信息,Mozilla 的 pdf.js 可以修改 PDF 吗?,无法使用 pdf.js 编辑 PDF。

I've found jsPDFand while I'm able to take the canvas and do a .toDataURL()with it for each page and build a new PDF document with it, but there are two issues:

我找到了jsPDF并且虽然我能够使用画布并.toDataURL()为每个页面使用它并使用它构建一个新的 PDF 文档,但有两个问题:

  1. The newly generated PDF will just be a series of images on each page, so any text in the original PDF will just be an image after I'm done with it.
  2. I generate a new PDF with jsPDF and then send the base64 of it back to pdf.js to display it on the canvas. Something happens between these steps where the images of the pages get scaled incorrectly, so each page takes up about 3/4 of the canvas after each new PDF change. I've been unable to get it to retain the same size/scale.
  1. 新生成的 PDF 将只是每一页上的一系列图像,因此原始 PDF 中的任何文本在我完成后都将成为图像。
  2. 我用 jsPDF 生成一个新的 PDF,然后将它的 base64 发送回 pdf.js 以在画布上显示它。在这些步骤之间发生了一些事情,其中​​页面的图像被错误地缩放,因此在每次新的 PDF 更改后,每个页面占用大约 3/4 的画布。我一直无法让它保持相同的大小/比例。

jsPDF doesn't look like it has a way to load an existing PDF, it only creates new ones. pdfmakeand PDFKitalso look like they only create new PDF files.

jsPDF 看起来没有办法加载现有的 PDF,它只能创建新的 PDF。pdfmakePDFKit看起来也只是创建新的 PDF 文件。



So my question:

所以我的问题:

Is there anything that will allow for both viewing a pdf (from base64) and for making changes to it? Ideally I'd watch for changes to the canvas, then draw that change onto the pdf page. When done, export that to a base64 string to send back to the server.

有什么可以同时查看pdf(来自base64)并对其进行更改的吗?理想情况下,我会观察画布的变化,然后将该变化绘制到 pdf 页面上。完成后,将其导出为 base64 字符串以发送回服务器。

回答by Vanquished Wombat

Quick answer - no and it is quite unlikely you will find a cross-browser solution. It is very unlikely that you will find a PDF-perfect solution. Better to think about having the users edit HTML and generate the PDF at the server.

快速回答 - 不,您不太可能找到跨浏览器的解决方案。您不太可能找到完美的 PDF 解决方案。最好考虑让用户编辑 HTML 并在服务器上生成 PDF。

Why - the PDF format is both brilliant and fiendish at the same time. Brilliant because of its portability, but fiendish because of the internal structure and storage mechanisms. There is no friendly 'DOM' like with HTML. If we were starting out afresh to develop a portable document format it would not be PDF that we would choose. But PDF currently has too much momentum to be thrown away, period.

为什么 - PDF 格式既精彩又不俗。因其便携性而出色,但因其内部结构和存储机制而令人讨厌。没有像 HTML 那样友好的“DOM”。如果我们重新开始开发可移植的文档格式,我们不会选择 PDF。但是 PDF 目前有太多的动力不能被抛弃,时期。

Younger viewers might be wondering how the hell this manic format got into its market leading position and where it came from. Well, when the founding fathers of PDF were laying down the design, before XML, JSON, HTML and even the Internet, they weren't working with today's document sharing in mind. They were working on a better way to encode printing instructions - the PostScript printer driver concept. These were never expected to be edited before the printer consumed them, and they were worthless for any other purpose. Then someone noticed the you could interpret the PostScript drawing instructions to a screen, and subsequently someone spotted the fantastic potential to employ this as a transportable, cross device display concept. And here we are.

年轻观众可能想知道这种狂躁的格式到底是如何进入市场领先地位的,以及它来自哪里。嗯,当 PDF 的创始人在制定设计时,在 XML、JSON、HTML 甚至 Internet 出现之前,他们并没有考虑到今天的文档共享。他们正在研究一种更好的方式来编码打印指令 - PostScript 打印机驱动程序概念。在打印机消耗它们之前,从不期望它们被编辑,并且它们对于任何其他目的毫无价值。然后有人注意到您可以将 PostScript 绘图指令解释到屏幕上,随后有人发现将其用作可移动的跨设备显示概念的巨大潜力。我们来了。

Back to the question - to edit a PDF in any meaningful GUI way, you would need to unpack the PDF and render the components (images, formatted text, pages) to the display device; then allow folks to mess with the layout; then re-pack the PDF. You would have to do this perfectly in line with the PDF standards otherwise you may find the downstream consumers of your edited PDF file crash or are unable to render it. You would have to cater for the various Acrobat standard levels, and the shortcuts and bloats that the editing package (Word, Illustrator, InDesign) vendors chuck into the PDF file; layers, thumbnails, etc.

回到问题 - 要以任何有意义的 GUI 方式编辑 PDF,您需要解压缩 PDF 并将组件(图像、格式化文本、页面)呈现到显示设备;然后让人们弄乱布局;然后重新打包PDF。您必须完全按照 PDF 标准执行此操作,否则您可能会发现编辑的 PDF 文件的下游消费者崩溃或无法渲染它。您必须满足各种 Acrobat 标准级别,以及编辑包(Word、Illustrator、InDesign)供应商在 PDF 文件中添加的快捷方式和膨胀;图层、缩略图等。

Then we come to colors. Have a read of the PDF spec and you will see that there are an array of colorspace options that the original PDF producer can decide to use. You would have to interpret these to a reasonable device color on the screen and back, etc.

然后我们来到颜色。阅读 PDF 规范,您将看到原始 PDF 制作者可以决定使用的一系列色彩空间选项。您必须将这些解释为屏幕和背面等合理的设备颜色。

And then fonts. Fonts might be embedded subset, or not. To keep fidelity with the PDF you will need to realise the glyphs as vector graphics on your drawing surface at the scale defined in the PDF. This mostly means utilising some kind of platform-dependant type library - tricky cross-platform. Plus the fact that you will need to licence the fonts for appropriate use which can be pricey for the fonts most people want to use to look hip and professional.

然后是字体。字体可能是嵌入的子集,也可能不是。为了保持对 PDF 的保真度,您需要按照 PDF 中定义的比例在绘图表面上将字形实现为矢量图形。这主要意味着使用某种平台相关的类型库——棘手的跨平台。此外,您需要获得字体的适当使用许可,这对于大多数人想要看起来时髦和专业的字体来说可能是昂贵的。

Given the layering, scaling and rotating features in PDF, you would likely be looking at an html canvas as the drawing surface. Anyone who knows will tell you that in the world of canvas you are pretty much on your own for word-processing type functions.

鉴于 PDF 中的分层、缩放和旋转功能,您可能会将 html 画布视为绘图表面。任何知道的人都会告诉您,在画布的世界中,您几乎可以自己处理文字处理类型的功能。

Not impossible but hard.

不是不可能,但很难。

Components that render PDF to a display are largely acting as print drivers, slavishly obeying the PDF drawing instructions, and usually generating a raster or sometimes an SVG graphic. This is a one-way street - they read and draw, but there is no sense of 'handles' to the objects drawn. No handles means no manipulation, and these guys certainly have little intention of letting you modify and write back.

将 PDF 渲染到显示器的组件主要充当打印驱动程序,严格遵守 PDF 绘图指令,通常生成光栅或有时生成 SVG 图形。这是一条单行道——他们阅读和画画,但对所画的对象没有“把手”的感觉。没有句柄就意味着没有操作,而且这些家伙当然无意让您修改和回写。

You will find many 'save to pdf' products. When client-side they will be leaning toward grabbing a set of pixels and dumping a raster graphic into a file with the thinnest veneer of 'PDF' definition wrapped around it. Where they are server based then they can be quite powerful - there are plenty of tools like Aspose, and ABCPDF that truly offer some PDF wrangling server side - but this is not what you are looking for in your OP.

您会发现许多“保存到 pdf”的产品。当客户端时,他们将倾向于抓取一组像素并将光栅图形转储到一个文件中,文件中包裹着最薄的“PDF”定义。如果它们是基于服务器的,那么它们就可以非常强大——有很多工具,比如 Aspose 和 ABCPDF,它们真正提供了一些 PDF 处理服务器端——但这不是你在你的 OP 中寻找的。

Summary - very complicated subject. If anything ever emerges as a potential it will likely have many constraints in terms of the PDF features covered and thus restrictions on what it can safely edit.

总结 - 非常复杂的主题。如果有任何潜在的东西出现,就涵盖的 PDF 功能而言,它可能会受到许多限制,从而限制了它可以安全编辑的内容。

If you are looking for online editing of documents that are ultimately exported as PDF, then a way forward is to keep an html version of the document source and have the user edit this with TinyMCE, CKEditor, etc, then use one of the server-side tools to take the saved source HTML and render out to PDF. Tools like ABCPDF render HTML faithfully let you add images, headers and footers, page numbers, etc.

如果您正在寻找最终导出为 PDF 的文档的在线编辑,那么前进的方法是保留文档源的 html 版本并让用户使用 TinyMCE、CKEditor 等对其进行编辑,然后使用其中一个服务器 -边工具来获取保存的源 HTML 并呈现为 PDF。像 ABCPDF 这样的工具可以忠实地呈现 HTML,让您可以添加图像、页眉和页脚、页码等。

This is a pragmatic answer to your (assumed) need, though it still has some trade-offs in terms of the font (licencing) issues, clunkiness of browser-based editors, all-round weirdness of the HTML laid down by some HTML editing components, etc. But it IS viable.

这是对您(假设的)需求的务实回答,尽管它在字体(许可)问题、基于浏览器的编辑器的笨拙、某些 HTML 编辑所规定的 HTML 的全面怪异方面仍有一些权衡组件等。但它是可行的。

Final thoughts - rethink the scope of what you need. If HTML editing and convert to PDF at server is usable for you it is a well-trodden path and you will find both free and commercial components for client and server to support it.

最后的想法 - 重新考虑您需要的范围。如果在服务器上编辑 HTML 并转换为 PDF 对您来说是可用的,那么这是一条很受欢迎的路径,您会发现客户端和服务器的免费和商业组件都支持它。

Edit: If you need to annotate the PDF then things are much easier. On the server, you need to generate images of the pages of the document, send those to the client, display them to the user, let the user mark them up, capture the co-ordinates of the annotations back to the server and use a server-side PDF library to render the annotations into the PDF. It is achievable, though requires various skillsets for server-side PDF to image manipulation and client side presentation and annotation capture.

编辑:如果您需要对 PDF 进行注释,那么事情就容易多了。在服务器上,您需要生成文档页面的图像,将这些图像发送到客户端,将它们显示给用户,让用户对其进行标记,将注释的坐标捕获回服务器并使用服务器端 PDF 库将注释渲染到 PDF 中。这是可以实现的,但需要从服务器端 PDF 到图像处理以及客户端演示和注释捕获的各种技能组合。

Edit: Readers may be interested in knowing if the picture I painted above has changed. As of Jan 2019 I stand by what I wrote. Suppliers are coming to the market with better tools and libraries that can do more than previously. However you still need to assess your needs and confirm their restrictions - it is likely that there will be some. No vendor I am aware of yet has a client-side, cross-browser, cross-device, full capability PDF editing lib for anyPDF file - there is always some limitation. But I am happy to be corrected.

编辑:读者可能有兴趣知道我上面画的图片是否发生了变化。截至 2019 年 1 月,我坚持我写的内容。供应商正在以更好的工具和库进入市场,这些工具和库可以比以前做得更多。但是,您仍然需要评估您的需求并确认它们的限制——很可能会有一些限制。我所知道的供应商还没有为任何PDF 文件提供客户端、跨浏览器、跨设备、全功能的 PDF 编辑库——总是有一些限制。但我很高兴得到纠正。

回答by allinonemovie

For future reference:

备查:

I found two libraries, that enable you to edit existing PDFs in the browser to certain extends. The second one isn't documented yet, so I don't know exactly what it does. It might be the solution for such a problem in the future.

我找到了两个库,它们使您能够将浏览器中的现有 PDF 编辑到某些扩展。第二个还没有记录,所以我不知道它到底是做什么的。它可能是未来此类问题的解决方案。

回答by Ryan

Because other SO questions are being directed here, and considering how fast web technology advances (e.g. WASM), I am providing the following answer. Though PDFNetJS was able to do all this when the question was originally asked.

因为其他 SO 问题都指向这里,并且考虑到 Web 技术的发展速度(例如 WASM),我提供以下答案。尽管最初提出问题时 PDFNetJS 能够做到这一切。

Since the requirement of "edit" was clarified to be "Basically what is needed is for users to open up a previously uploaded PDF, highlight or circle sections, and then save those annotations to the PDF back on the server." and "No text editing or manipulation of the document contant needs to happen.", then yes this is possible completely in any modern browser on any modern device.

由于“编辑”的要求被明确为“基本上需要用户打开以前上传的PDF,突出显示或圈出部分,然后将这些注释保存到服务器上的PDF中。”和“无文本需要对文档内容进行编辑或操作。”,那么是的,这在任何现代设备上的任何现代浏览器中都是完全可能的。

PDFTron PDFNet SDKcan do all this. A full fledged, out of the box document viewer is provided, with full annotation support. It is also possible to actually edit the PDF (change/replace text, redact, extract/add/replace images, and more). Not only are PDF files supported directly client side, but so are DOCX, PPTX, XLSX, PNG and JPG. Files can be loaded locally or remotely, and there is no need for slow base64 encoding/decoding.

PDFTron PDFNet SDK可以做到这一切。提供了一个完整的、开箱即用的文档查看器,具有完整的注释支持。还可以实际编辑 PDF(更改/替换文本、编辑、提取/添加/替换图像等)。不仅客户端直接支持 PDF 文件,DOCX、PPTX、XLSX、PNG 和 JPG 也是如此。文件可以本地或远程加载,不需要缓慢的base64编码/解码。

Demo: http://www.pdftron.com/webviewer

演示:http: //www.pdftron.com/webviewer

Samples: http://www.pdftron.com/documentation/web/samples/universal-samples

示例:http: //www.pdftron.com/documentation/web/samples/universal-samples

The original question was also for support for Siebel and "PDFNetJS tries to retrieve a .mem file, which is some binary data. This cannot be served by the application I'm using (Siebel) so it doesn't look like this is an option.".

最初的问题也是为了支持 Siebel 和“ PDFNetJS 试图检索一个 .mem 文件,它是一些二进制数据。我正在使用的应用程序 (Siebel) 无法提供此服务,因此它看起来不像是一个选项。”。

The .mem file is for PNaCl which is Chrome only, and this can be disabled. PDFTron for Web supports WASM and even emscripten, one of which, if not both, should then be compatible with Siebel.

.mem 文件适用于 PNaCl,它仅适用于 Chrome,可以禁用该文件。PDFTron for Web 支持 WASM 甚至 emscripten,其中之一(如果不是两者)应该与 Siebel 兼容。