Java 使用 pdfbox 获取表单字段值

Question

提问by Skizzo

I'm using pdfbox for the first time. Now I'm reading something on the website Pdf

我第一次使用pdfbox。现在我正在阅读网站上的一些东西Pdf

Summarizing I have a pdf like this:

总结一下我有一个这样的pdf：

enter image description here

在此处输入图片说明

only that my file has many and many different component(textField,RadionButton,CheckBox). For this pdf I have to read these values : Mauro,Rossi,MyCompany. For now I wrote the following code:

只是我的文件有很多不同的组件（textField、RadionButton、CheckBox）。对于此 pdf，我必须阅读以下值：Mauro、Rossi、MyCompany。现在我写了以下代码：

PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

for(PDField pdField : pdAcroForm.getFields()){
    System.out.println(pdField.getValue())
}

Is this a correct way to read the value inside the form component? Any suggestion about this? Where can I learn other things on pdfbox?

这是读取表单组件内值的正确方法吗？对此有何建议？我在哪里可以在 pdfbox 上学到其他东西？

Answer 1

采纳答案by John Farrelly

The code you have should work. If you are actually looking to do something with the values, you'll likely need to use some other methods. For example, you can get specific fields using pdAcroForm.getField(<fieldName>):

您拥有的代码应该可以工作。如果您真的想对这些值做一些事情，您可能需要使用其他一些方法。例如，您可以使用pdAcroForm.getField(<fieldName>)以下方法获取特定字段：

PDField firstNameField = pdAcroForm.getField("firstName");
PDField lastNameField = pdAcroForm.getField("lastName");

Note that PDFieldis just a base class. You can cast things to sub classes to get more interesting information from them. For example:

请注意，这PDField只是一个基类。您可以将事物转换为子类以从中获取更多有趣的信息。例如：

PDCheckbox fullTimeSalary = (PDCheckbox) pdAcroForm.getField("fullTimeSalary");
if(fullTimeSalary.isChecked()) {
    log.debug("The person earns a full-time salary");
} else {
    log.debug("The person does not earn a full-time salary");
}

As you suggest, you'll find more information at the apache pdfbox website.

正如您所建议的，您可以在 apache pdfbox 网站上找到更多信息。

Answer 2

回答by alltej

The field can be a top-level field. So you need to loop until it is no longer a top-level field, then you can get the value. Code snippet below loops through all the fields and outputs the field names and values.

该字段可以是顶级字段。所以需要循环直到不再是顶级字段，然后才能拿到值。下面的代码片段循环遍历所有字段并输出字段名称和值。

{
    //from your original code
    PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
    PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
    PDAcroForm pdAcroForm = pdCatalog.getAcroForm();


    //get all fields in form
    List<PDField> fields = acroForm.getFields();
    System.out.println(fields.size() + " top-level fields were found on the form");

    //inspect field values
    for (PDField field : fields)
    {
            processField(field, "|--", field.getPartialName());
    }

    ...
}


private void processField(PDField field, String sLevel, String sParent) throws IOException
{
        String partialName = field.getPartialName();

        if (field instanceof PDNonTerminalField)
        {
                if (!sParent.equals(field.getPartialName()))
                {
                        if (partialName != null)
                        {
                                sParent = sParent + "." + partialName;
                        }
                }
                System.out.println(sLevel + sParent);

                for (PDField child : ((PDNonTerminalField)field).getChildren())
                {
                        processField(child, "|  " + sLevel, sParent);
                }
        }
        else
        {
            //field has no child. output the value
                String fieldValue = field.getValueAsString();
                StringBuilder outputString = new StringBuilder(sLevel);
                outputString.append(sParent);
                if (partialName != null)
                {
                        outputString.append(".").append(partialName);
                }
                outputString.append(" = ").append(fieldValue);
                outputString.append(",  type=").append(field.getClass().getName());
                System.out.println(outputString);
        }
}

Java 使用 pdfbox 获取表单字段值

提问by Skizzo

采纳答案by John Farrelly

回答by alltej

相关推荐

最近更新

标签

Java 使用 pdfbox 获取表单字段值

提问by Skizzo

采纳答案by John Farrelly

回答by alltej

相关推荐

为什么java rmi 一直连接到127.0.1.1。ip什么时候是192.168.XX？

将json URL导入java并使用jackson库解析

在 Java 中创建一个简单的自定义 JComponent？

Java 带有复合主键的 SELECT 查询

相关推荐

最近更新

标签