java PDFBox 2.0 RC3 -- 查找和替换文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35420609/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PDFBox 2.0 RC3 -- Find and replace text
提问by Shaun
How can one find and replace text inside a PDF document using PDFBox 2.0, they pulled the old example and it's syntax no longer works so I am wondering if it's still possible and if so what the best way to go about it is. Thanks!
如何使用 PDFBox 2.0 查找和替换 PDF 文档中的文本,他们提取了旧示例并且它的语法不再有效,所以我想知道它是否仍然可能,如果是这样,最好的方法是什么。谢谢!
回答by mourphy
You can try like this:
你可以这样试试:
public static PDDocument replaceText(PDDocument document, String searchString, String replacement) throws IOException {
if (Strings.isEmpty(searchString) || Strings.isEmpty(replacement)) {
return document;
}
PDPageTree pages = document.getDocumentCatalog().getPages();
for (PDPage page : pages) {
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++) {
Object next = tokens.get(j);
if (next instanceof Operator) {
Operator op = (Operator) next;
//Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj")) {
// Tj takes one operator and that is the string to display so lets update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
string = string.replaceFirst(searchString, replacement);
previous.setValue(string.getBytes());
} else if (op.getName().equals("TJ")) {
COSArray previous = (COSArray) tokens.get(j - 1);
for (int k = 0; k < previous.size(); k++) {
Object arrElement = previous.getObject(k);
if (arrElement instanceof COSString) {
COSString cosString = (COSString) arrElement;
String string = cosString.getString();
string = StringUtils.replaceOnce(string, searchString, replacement);
cosString.setValue(string.getBytes());
}
}
}
}
}
// now that the tokens are updated we will replace the page content stream.
PDStream updatedStream = new PDStream(document);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens(tokens);
page.setContents(updatedStream);
out.close();
}
return document;
}
回答by Tim Coy
I spent much time on coming up with a solution for this and ended up acquiring an Acrobat DC subscription so that I could create fields as placeholders for the text to be replaced. These fields in my case, were for customer information and order details so it was not very complex data, but the document was filled with pages of business related conditions and had a very complex layout.
我花了很多时间想出一个解决方案,最终获得了 Acrobat DC 订阅,这样我就可以创建字段作为要替换的文本的占位符。在我的案例中,这些字段用于客户信息和订单详细信息,因此它不是非常复杂的数据,但该文档充满了与业务相关的条件页面,并且布局非常复杂。
Then I simply did this, which may be suitable for you.
那我干脆做了这个,可能适合你。
private void update() throws InvalidPasswordException, IOException {
Map<String, String> map = new HashMap<>();
map.put("fieldname", "value to update");
File template = new File("template.pdf");
PDDocument document = PDDocument.load(template);
List<PDField> fields = document.getDocumentCatalog().getAcroForm().getFields();
for (PDField field : fields) {
for (Map.Entry<String, String> entry : map.entrySet()) {
if (entry.getKey().equals(field.getFullyQualifiedName())) {
field.setValue(entry.getValue());
field.setReadOnly(true);
}
}
}
File out = new File("out.pdf");
document.save(out);
document.close();
}
YMMV
青年会