java PDFBox 2.0 RC3 -- 查找和替换文本

Question

提问by Shaun

How can one find and replace text inside a PDF document using PDFBox 2.0, they pulled the old example and it's syntax no longer works so I am wondering if it's still possible and if so what the best way to go about it is. Thanks!

如何使用 PDFBox 2.0 查找和替换 PDF 文档中的文本，他们提取了旧示例并且它的语法不再有效，所以我想知道它是否仍然可能，如果是这样，最好的方法是什么。谢谢！

Answer 1

回答by mourphy

You can try like this:

你可以这样试试：

public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException {
    if (Strings.isEmpty(searchString) || Strings.isEmpty(replacement)) {
        return document;
    }
    PDPageTree pages = document.getDocumentCatalog().getPages();
    for (PDPage page : pages) {
        PDFStreamParser parser = new PDFStreamParser(page);
        parser.parse();
        List tokens = parser.getTokens();
        for (int j = 0; j < tokens.size(); j++) {
            Object next = tokens.get(j);
            if (next instanceof Operator) {
                Operator op = (Operator) next;
                //Tj and TJ are the two operators that display strings in a PDF
                if (op.getName().equals("Tj")) {
                    // Tj takes one operator and that is the string to display so lets update that operator
                    COSString previous = (COSString) tokens.get(j - 1);
                    String string = previous.getString();
                    string = string.replaceFirst(searchString, replacement);
                    previous.setValue(string.getBytes());
                } else if (op.getName().equals("TJ")) {
                    COSArray previous = (COSArray) tokens.get(j - 1);
                    for (int k = 0; k < previous.size(); k++) {
                        Object arrElement = previous.getObject(k);
                        if (arrElement instanceof COSString) {
                            COSString cosString = (COSString) arrElement;
                            String string = cosString.getString();
                            string = StringUtils.replaceOnce(string, searchString, replacement);
                            cosString.setValue(string.getBytes());
                        }
                    }
                }
            }
        }
        // now that the tokens are updated we will replace the page content stream.
        PDStream updatedStream = new PDStream(document);
        OutputStream out = updatedStream.createOutputStream();
        ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
        tokenWriter.writeTokens(tokens);
        page.setContents(updatedStream);
        out.close();
    }
    return document;
}

Answer 2

回答by Tim Coy

I spent much time on coming up with a solution for this and ended up acquiring an Acrobat DC subscription so that I could create fields as placeholders for the text to be replaced. These fields in my case, were for customer information and order details so it was not very complex data, but the document was filled with pages of business related conditions and had a very complex layout.

我花了很多时间想出一个解决方案，最终获得了 Acrobat DC 订阅，这样我就可以创建字段作为要替换的文本的占位符。在我的案例中，这些字段用于客户信息和订单详细信息，因此它不是非常复杂的数据，但该文档充满了与业务相关的条件页面，并且布局非常复杂。

Then I simply did this, which may be suitable for you.

那我干脆做了这个，可能适合你。

private void update() throws InvalidPasswordException, IOException {
    Map<String, String> map = new HashMap<>();
    map.put("fieldname", "value to update");
    File template = new File("template.pdf");
    PDDocument document = PDDocument.load(template);
    List<PDField> fields = document.getDocumentCatalog().getAcroForm().getFields();
    for (PDField field : fields) {
        for (Map.Entry<String, String> entry : map.entrySet()) {
            if (entry.getKey().equals(field.getFullyQualifiedName())) {
                field.setValue(entry.getValue());
                field.setReadOnly(true);
            }
        }
    }
    File out = new File("out.pdf");
    document.save(out);
    document.close();
}

YMMV

青年会

java PDFBox 2.0 RC3 -- 查找和替换文本

提问by Shaun

回答by mourphy

回答by Tim Coy

相关推荐

最近更新

标签

java PDFBox 2.0 RC3 -- 查找和替换文本

提问by Shaun

回答by mourphy

回答by Tim Coy

相关推荐

java 自定义 UserDetailsS​​ervice 似乎不是自动装配的

java RxJava - 当返回可能为空时使用 flatmap

java.lang.IllegalStateException: 请求无法执行；I/O 反应器状态：STOPPED

java Retrofit2 Post body 为 Json

相关推荐

最近更新

标签

java 自定义 UserDetailsService 似乎不是自动装配的