java 使用 PDFBox 从特定页面读取文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13563482/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
read text from a particular page using PDFBox
提问by Shyam Sundar Ananthaswamy
I know how to read text of an entire pdf file usinf PDFBox using PDFTextStripper.getText(PDDocument)
.
我知道如何使用PDFTextStripper.getText(PDDocument)
.
I also have a sample on how to get an object reference to a particular page using PDDocumentCatalog.getAllPages().get(i)
.
我还有一个关于如何使用PDDocumentCatalog.getAllPages().get(i)
.
How do I get the text of just one page using PDFBox as I dont see any such method on PDPage
class?
由于在PDPage
课堂上没有看到任何此类方法,如何使用 PDFBox 获取仅一页的文本?
回答by amaidment
You can set parameters on the PDFTextStripper
to read particular pages:
您可以在 上设置参数PDFTextStripper
以读取特定页面:
PDDocument doc; // document
int i; // page no.
PDFTextStripper reader = new PDFTextStripper();
reader.setStartPage(i);
reader.setEndPage(i);
String pageText = reader.getText(doc);
As far as I'm aware, PDPage
is more used with representing a page onscreen, rather than extracting text. As such, I wouldn't recommend using this to extract text.
据我所知,PDPage
更多地用于在屏幕上表示页面,而不是提取文本。因此,我不建议使用它来提取文本。