javascript 是否有支持文本选择的简约 PDF.js 示例?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16775907/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-27 05:56:48  来源:igfitidea点击:

Is there a minimalistic PDF.js sample that supports text selection?

javascriptpdf.js

提问by André Pena

I'm trying PDF.js.

我正在尝试PDF.js

My problem is that the Hello World demodoes not support text selection. It will draw everything in a canvas without the text layer. The official PDF.js demodoes support text selection but the code is too complex. I was wondering if somebody has a minimalistic demo with the text layer.

我的问题是Hello World 演示不支持文本选择。它将在没有文本层的情况下在画布中绘制所有内容。该官员PDF.js演示不支持文本选择,但代码过于复杂。我想知道是否有人有一个带有文本层的简约演示。

回答by Vivin Paliath

I have committed the example to Mozilla's pdf.js repository and it is available under the examplesdirectory.

我已将示例提交到 Mozilla 的 pdf.js 存储库,它在examples目录下可用。

The original example that I committed to pdf.js no longer exists, but I believe it thisexample showcases text-selection. They have cleaned up and reorganized pdf.js and so the text-selection logic is encapsulated inside the text-layer, which can be created using a factory.

我提交给 pdf.js 的原始示例不再存在,但我相信这个示例展示了文本选择。他们对 pdf.js 进行了清理和重组,因此文本选择逻辑被封装在文本层中,可以使用工厂创建。

Specifically, PDFJS.DefaultTextLayerFactorytakes care of setting up the basic text-selection stuff.

具体来说,PDFJS.DefaultTextLayerFactory负责设置基本的文本选择内容。



The following example is outdated; only leaving it here for historical reasons.

以下示例已过时;只是因为历史原因把它留在这里。

I have been struggling with this problem for 2-3 days now, but I finally figured it out. Hereis a fiddle that shows you how to load a PDF with text-selection enabled.

我已经为这个问题苦苦挣扎了 2-3 天,但我终于想通了。是一个小提琴,向您展示如何在启用文本选择的情况下加载 PDF。

The difficulty in figuring this out was that the text-selection logic was intertwined with the viewer code (viewer.js, viewer.html, viewer.css). I had to extricate relevant code and CSS out to get this to work (that JavaScript file is referenced in the file; you can also check it out here). The end result is a minimal demo that should prove helpful. To implement selection properly, the CSS that is in viewer.cssis also extremely important as it sets up CSS styles for the divs that are eventually created and then used to get text selection working.

解决这个问题的困难在于文本选择逻辑与查看器代码(viewer.jsviewer.htmlviewer.css)交织在一起。我必须提取相关代码和 CSS 才能使其工作(该文件中引用了该 JavaScript 文件;您也可以在此处查看)。最终结果是一个最小的演示,应该证明是有帮助的。为了正确实现选择,其中的 CSSviewer.css也非常重要,因为它为div最终创建的s设置了 CSS 样式,然后用于使文本选择工作。

The heavy lifting is done by the TextLayerBuilderobject, which actually handles the creation of the selection divs. You can see calls to this object from within viewer.js.

繁重的工作由TextLayerBuilder对象完成,它实际上处理选择divs的创建。您可以从 中看到对此对象的调用viewer.js

Anyway, here's the code including the CSS. Keep in mind that you will still need the pdf.jsfile. My fiddle has a link to a version that I built from Mozilla's GitHub repo for pdf.js. I didn't want to link to the repo's version directly since they are constantly developing it and it may be broken.

无论如何,这是包含 CSS 的代码。请记住,您仍然需要该pdf.js文件。我的小提琴有一个指向我从 Mozilla 的 GitHub 存储库为pdf.js. 我不想直接链接到 repo 的版本,因为他们一直在开发它并且它​​可能会被破坏。

So without further ado:

因此,事不宜迟:

HTML:

HTML:

<html>
    <head>
        <title>Minimal pdf.js text-selection demo</title>
    </head>

    <body>
        <div id="pdfContainer" class = "pdf-content">
        </div>
    </body>
</html>

CSS:

CSS:

.pdf-content {
    border: 1px solid #000000;
}

/* CSS classes used by TextLayerBuilder to style the text layer divs */

/* This stuff is important! Otherwise when you select the text, the text in the divs will show up! */
::selection { background:rgba(0,0,255,0.3); }
::-moz-selection { background:rgba(0,0,255,0.3); }

.textLayer {
    position: absolute;
    left: 0;
    top: 0;
    right: 0;
    bottom: 0;
    color: #000;
    font-family: sans-serif;
    overflow: hidden;
}

.textLayer > div {
    color: transparent;
    position: absolute;
    line-height: 1;
    white-space: pre;
    cursor: text;
}

.textLayer .highlight {
    margin: -1px;
    padding: 1px;

    background-color: rgba(180, 0, 170, 0.2);
    border-radius: 4px;
}

.textLayer .highlight.begin {
    border-radius: 4px 0px 0px 4px;
}

.textLayer .highlight.end {
    border-radius: 0px 4px 4px 0px;
}

.textLayer .highlight.middle {
    border-radius: 0px;
}

.textLayer .highlight.selected {
    background-color: rgba(0, 100, 0, 0.2);
}

JavaScript:

JavaScript:

//Minimal PDF rendering and text-selection example using pdf.js by Vivin Suresh Paliath (http://vivin.net)
//This fiddle uses a built version of pdf.js that contains all modules that it requires.
//
//For demonstration purposes, the PDF data is not going to be obtained from an outside source. I will be
//storing it in a variable. Mozilla's viewer does support PDF uploads but I haven't really gone through
//that code. There are other ways to upload PDF data. For instance, I have a Spring app that accepts a
//PDF for upload and then communicates the binary data back to the page as base64. I then convert this
//into a Uint8Array manually. I will be demonstrating the same technique here. What matters most here is
//how we render the PDF with text-selection enabled. The source of the PDF is not important; just assume
//that we have the data as base64.
//
//The problem with understanding text selection was that the text selection code has heavily intertwined
//with viewer.html and viewer.js. I have extracted the parts I need out of viewer.js into a separate file
//which contains the bare minimum required to implement text selection. The key component is TextLayerBuilder,
//which is the object that handles the creation of text-selection divs. I have added this code as an external
//resource.
//
//This demo uses a PDF that only has one page. You can render other pages if you wish, but the focus here is
//just to show you how you can render a PDF with text selection. Hence the code only loads up one page.
//
//The CSS used here is also very important since it sets up the CSS for the text layer divs overlays that
//you actually end up selecting. 
//
//For reference, the actual PDF document that is rendered is available at:
//http://vivin.net/pub/pdfjs/TestDocument.pdf

var pdfBase64 = "..."; //should contain base64 representing the PDF

var scale = 1; //Set this to whatever you want. This is basically the "zoom" factor for the PDF.

/**
 * Converts a base64 string into a Uint8Array
 */
function base64ToUint8Array(base64) {
    var raw = atob(base64); //This is a native function that decodes a base64-encoded string.
    var uint8Array = new Uint8Array(new ArrayBuffer(raw.length));
    for(var i = 0; i < raw.length; i++) {
        uint8Array[i] = raw.charCodeAt(i);
    }

    return uint8Array;
}

function loadPdf(pdfData) {
    PDFJS.disableWorker = true; //Not using web workers. Not disabling results in an error. This line is
                                //missing in the example code for rendering a pdf.

    var pdf = PDFJS.getDocument(pdfData);
    pdf.then(renderPdf);                               
}

function renderPdf(pdf) {
    pdf.getPage(1).then(renderPage);
}

function renderPage(page) {
    var viewport = page.getViewport(scale);
    var $canvas = jQuery("<canvas></canvas>");

    //Set the canvas height and width to the height and width of the viewport
    var canvas = $canvas.get(0);
    var context = canvas.getContext("2d");
    canvas.height = viewport.height;
    canvas.width = viewport.width;

    //Append the canvas to the pdf container div
    jQuery("#pdfContainer").append($canvas);

    //The following few lines of code set up scaling on the context if we are on a HiDPI display
    var outputScale = getOutputScale();
    if (outputScale.scaled) {
        var cssScale = 'scale(' + (1 / outputScale.sx) + ', ' +
            (1 / outputScale.sy) + ')';
        CustomStyle.setProp('transform', canvas, cssScale);
        CustomStyle.setProp('transformOrigin', canvas, '0% 0%');

        if ($textLayerDiv.get(0)) {
            CustomStyle.setProp('transform', $textLayerDiv.get(0), cssScale);
            CustomStyle.setProp('transformOrigin', $textLayerDiv.get(0), '0% 0%');
        }
    }

    context._scaleX = outputScale.sx;
    context._scaleY = outputScale.sy;
    if (outputScale.scaled) {
        context.scale(outputScale.sx, outputScale.sy);
    }     

    var canvasOffset = $canvas.offset();
    var $textLayerDiv = jQuery("<div />")
        .addClass("textLayer")
        .css("height", viewport.height + "px")
        .css("width", viewport.width + "px")
        .offset({
            top: canvasOffset.top,
            left: canvasOffset.left
        });

    jQuery("#pdfContainer").append($textLayerDiv);

    page.getTextContent().then(function(textContent) {
        var textLayer = new TextLayerBuilder($textLayerDiv.get(0), 0); //The second zero is an index identifying
                                                                       //the page. It is set to page.number - 1.
        textLayer.setTextContent(textContent);

        var renderContext = {
            canvasContext: context,
            viewport: viewport,
            textLayer: textLayer
        };

        page.render(renderContext);
    });
}

var pdfData = base64ToUint8Array(pdfBase64);
loadPdf(pdfData);    

回答by Mosta

Because This is an old question and old accepted answer, to get it working with recent PDF.JS versions you may use this solution

因为这是一个老问题和旧的被接受的答案,为了让它与最近的 PDF.JS 版本一起工作,你可以使用这个解决方案

http://www.ryzhak.com/converting-pdf-file-to-html-canvas-with-text-selection-using-pdf-js

http://www.ryzhak.com/converting-pdf-file-to-html-canvas-with-text-selection-using-pdf-js

Here is the code they used : Include the following CSS and scripts from the PDF.js code

这是他们使用的代码:包括 PDF.js 代码中的以下 CSS 和脚本

<link rel="stylesheet" href="pdf.js/web/text_layer_builder.css" />
<script src="pdf.js/web/ui_utils.js"></script>
<script src="pdf.js/web/text_layer_builder.js"></script>

use this code to load the PDF :

使用此代码加载 PDF:

PDFJS.getDocument("oasis.pdf").then(function(pdf){
    var page_num = 1;
    pdf.getPage(page_num).then(function(page){
        var scale = 1.5;
        var viewport = page.getViewport(scale);
        var canvas = $('#the-canvas')[0];
        var context = canvas.getContext('2d');
        canvas.height = viewport.height;
        canvas.width = viewport.width;

        var canvasOffset = $(canvas).offset();
        var $textLayerDiv = $('#text-layer').css({
            height : viewport.height+'px',
            width : viewport.width+'px',
            top : canvasOffset.top,
            left : canvasOffset.left
        });

        page.render({
            canvasContext : context,
            viewport : viewport
        });

        page.getTextContent().then(function(textContent){
           console.log( textContent );
            var textLayer = new TextLayerBuilder({
                textLayerDiv : $textLayerDiv.get(0),
                pageIndex : page_num - 1,
                viewport : viewport
            });

            textLayer.setTextContent(textContent);
            textLayer.render();
        });
    });
});