Java 使用 pdfbox 获取表单字段值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23497324/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using pdfbox to get form field values
提问by Skizzo
I'm using pdfbox for the first time. Now I'm reading something on the website Pdf
我第一次使用pdfbox。现在我正在阅读网站上的一些东西Pdf
Summarizing I have a pdf like this:
总结一下我有一个这样的pdf:
only that my file has many and many different component(textField,RadionButton,CheckBox). For this pdf I have to read these values : Mauro,Rossi,MyCompany. For now I wrote the following code:
只是我的文件有很多不同的组件(textField、RadionButton、CheckBox)。对于此 pdf,我必须阅读以下值:Mauro、Rossi、MyCompany。现在我写了以下代码:
PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
for(PDField pdField : pdAcroForm.getFields()){
System.out.println(pdField.getValue())
}
Is this a correct way to read the value inside the form component? Any suggestion about this? Where can I learn other things on pdfbox?
这是读取表单组件内值的正确方法吗?对此有何建议?我在哪里可以在 pdfbox 上学到其他东西?
采纳答案by John Farrelly
The code you have should work. If you are actually looking to do something with the values, you'll likely need to use some other methods. For example, you can get specific fields using pdAcroForm.getField(<fieldName>)
:
您拥有的代码应该可以工作。如果您真的想对这些值做一些事情,您可能需要使用其他一些方法。例如,您可以使用pdAcroForm.getField(<fieldName>)
以下方法获取特定字段:
PDField firstNameField = pdAcroForm.getField("firstName");
PDField lastNameField = pdAcroForm.getField("lastName");
Note that PDField
is just a base class. You can cast things to sub classes to get more interesting information from them. For example:
请注意,这PDField
只是一个基类。您可以将事物转换为子类以从中获取更多有趣的信息。例如:
PDCheckbox fullTimeSalary = (PDCheckbox) pdAcroForm.getField("fullTimeSalary");
if(fullTimeSalary.isChecked()) {
log.debug("The person earns a full-time salary");
} else {
log.debug("The person does not earn a full-time salary");
}
As you suggest, you'll find more information at the apache pdfbox website.
正如您所建议的,您可以在 apache pdfbox 网站上找到更多信息。
回答by alltej
The field can be a top-level field. So you need to loop until it is no longer a top-level field, then you can get the value. Code snippet below loops through all the fields and outputs the field names and values.
该字段可以是顶级字段。所以需要循环直到不再是顶级字段,然后才能拿到值。下面的代码片段循环遍历所有字段并输出字段名称和值。
{
//from your original code
PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
//get all fields in form
List<PDField> fields = acroForm.getFields();
System.out.println(fields.size() + " top-level fields were found on the form");
//inspect field values
for (PDField field : fields)
{
processField(field, "|--", field.getPartialName());
}
...
}
private void processField(PDField field, String sLevel, String sParent) throws IOException
{
String partialName = field.getPartialName();
if (field instanceof PDNonTerminalField)
{
if (!sParent.equals(field.getPartialName()))
{
if (partialName != null)
{
sParent = sParent + "." + partialName;
}
}
System.out.println(sLevel + sParent);
for (PDField child : ((PDNonTerminalField)field).getChildren())
{
processField(child, "| " + sLevel, sParent);
}
}
else
{
//field has no child. output the value
String fieldValue = field.getValueAsString();
StringBuilder outputString = new StringBuilder(sLevel);
outputString.append(sParent);
if (partialName != null)
{
outputString.append(".").append(partialName);
}
outputString.append(" = ").append(fieldValue);
outputString.append(", type=").append(field.getClass().getName());
System.out.println(outputString);
}
}