php 从 PDF 表单中提取 PDF 表单字段名称

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2127878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 05:09:49  来源:igfitidea点击:

Extract PDF form field names from a PDF form

.netphppdf

提问by Christopher Done

I'm using pdftk to fill in a PDF form with an XFDF file. However, for this project I do not know in advance what fields will be present, so I need to analyse the PDF itself to see what fields need to be filled in, present an interface to the user accordingly, and then generate an XFDF file from that to fill in the PDF form.

我正在使用 pdftk 用 XFDF 文件填写 PDF 表单。但是,对于这个项目,我事先不知道会出现哪些字段,所以我需要分析PDF本身,看看需要填写哪些字段,相应地向用户呈现一个界面,然后从中生成一个XFDF文件填写PDF表格。

How do I get the field names? Preferably command-line, .NET or PHP solutions.

如何获取字段名称?最好是命令行、.NET 或 PHP 解决方案。

采纳答案by Christopher Done

I can get my client to export the XFDF file (which contains field names) using Acrobat along with the PDF, which avoids this problem completely.

我可以让我的客户使用 Acrobat 和 PDF 导出 XFDF 文件(包含字段名称),这完全避免了这个问题。

回答by TEHEK

Easy! You are using pdftk already

简单!您已经在使用 pdftk

# pdftk input.pdf dump_data_fields

It will output Field name, field type, some of it's properties (like what are the options for dropdown list or text alignment) and even a Tooltip text (which I found to be extremely useful)

它将输出字段名称、字段类型、其中一些属性(例如下拉列表或文本对齐的选项),甚至是工具提示文本(我发现这非常有用)

The only thing I'm missing is field coordinates...

我唯一缺少的是字段坐标...

回答by Dev_Corps

This worked for me:

这对我有用:

 pdftk 1.pdf dump_data_fields output test2.txt

Then when the file is encrypted with a password, this is how you can read from it

然后当文件用密码加密时,这就是你可以从中读取的方式

 pdftk 1.pdf input_pw YOUR_PASSWORD_GOES_HERE dump_data_fields output test2.txt

This took me 2 hours to get right, so hopefully i save you some time :)

这花了我 2 个小时才弄对,所以希望我能帮您节省一些时间:)

回答by Eric Flamm

I used the following code, using ABCpdf from WebSupergoo, but I imagine most libraries have comparable classes:

我使用了以下代码,使用来自 WebSupergoo 的 ABCpdf,但我想大多数库都有类似的类:

protected void Button1_Click(object sender, EventArgs e)
    {
        Doc thedoc = new Doc();
        string saveFile = "~/docs/f1_filled.pdf";
        System.Text.StringBuilder sb = new System.Text.StringBuilder();
        thedoc.Read(Server.MapPath("~/docs/F1_2010.pdf"));
        foreach (Field fld in thedoc.Form.Fields)
        {
            if (!(fld.Page == null))
            {
                sb.AppendFormat("Field: {0}, Type: {1},page: {4},x: {2},y: {3}\n", fld.Name, fld.FieldType.ToString(), fld.Rect.Left, fld.Rect.Top, fld.Page.PageNumber);
            }
            else
            {
                sb.AppendFormat("Field: {0}, Type: {1},page: {4},x: {2},y: {3}\n", fld.Name, fld.FieldType.ToString(), fld.Rect.Left, fld.Rect.Top, "None");
            }
            if (fld.FieldType == FieldType.Text)
            {
                fld.Value = fld.Name;
            }

        }

        this.TextBox1.Text = sb.ToString();
        this.TextBox1.Visible = true;
        thedoc.Save(Server.MapPath(saveFile));
        Response.Redirect(saveFile);
    }

This does 2 things: 1) Populates a textbox with the inventory of all Form Fields, showing their name, fieldtype, and their page number and position on the page (0,0 is lower left, by the way). 2) Populates all the textfields with their field name in an output file - print the output file, and all of your text fields will be labelled.

这做了两件事:1) 用所有表单域的清单填充一个文本框,显示它们的名称、字段类型、它们的页码和页面上的位置(顺便说一下,0,0 是左下角)。2) 在输出文件中使用字段名称填充所有文本字段 - 打印输出文件,所有文本字段都将被标记。

回答by Trung Lê

A very late answer from me, though my solution is not PHP, but I hope it might come in handy should anyone is looking for a solution for Ruby.

我的回答很晚,虽然我的解决方案不是 PHP,但我希望如果有人正在寻找 Ruby 的解决方案,它可能会派上用场。

First is to use pdftk to extract all fields name out then we need to cleanup the dump text, to have a good readable hash:

首先是使用 pdftk 提取所有字段名称,然后我们需要清理转储文本,以获得良好的可读哈希:

def extract_fields(filename)
  field_output = `pdftk #{filename} dump_data_fields 2>&1`
  @fields = field_output.split(/^---\n/).map do |field_text|
    if field_text =~ /^FieldName: (\w+)$/
      
    end
  end.compact.uniq
end

Second, now we can use any XML parse to construct our XFDF:

其次,现在我们可以使用任何 XML 解析来构建我们的 XFDF:

# code borrowed from `nguyen` gem [https://github.com/joneslee85/nguyen]
# generate XFDF content
def to_xfdf(fields = {}, options = {})
  builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
    xml.xfdf('xmlns' => 'http://ns.adobe.com/xfdf/', 'xml:space' => 'preserve') {
      xml.f(:href => options[:file]) if options[:file]
      xml.ids(:original => options[:id], :modified => options[:id]) if options[:id]
      xml.fields {
        fields.each do |field, value|
          xml.field(:name => field) {
            if value.is_a? Array
              value.each { |item| xml.value(item.to_s) }
            else
              xml.value(value.to_s)
            end
          }
        end
      }
    }
  end
  builder.to_xml
end

# write fdf content to path
def save_to(path)
  (File.open(path, 'w') << to_xfdf).close
end

Viola, that's the main logic. I highly recommend you give nguyen(https://github.com/joneslee85/nguyen) gem a try if you are looking for a lightweight lib in Ruby.

中提琴,这是主要逻辑。nguyen如果您正在寻找 Ruby 中的轻量级库,我强烈建议您尝试 (https://github.com/joneslee85/nguyen) gem。

回答by gallit

C# / ITextSharp

C#/ITextSharp

    public static void TracePdfFields(string pdfFilePath)
    {
        PdfReader pdfReader = new PdfReader(pdfFilePath);
        MemoryStream pdfStream = new MemoryStream();
        PdfStamper pdfStamper = new PdfStamper(pdfReader, pdfStream, '##代码##', true);

        int i = 1;
        foreach (var f in pdfStamper.AcroFields.Fields)
        {
            pdfStamper.AcroFields.SetField(f.Key, string.Format("{0} : {1}", i, f.Key));
            i++;
            //DoTrace("Field = [{0}] | Value = [{1}]", f.Key, f.Value.ToString());
        }
        pdfStamper.FormFlattening = false;
        pdfStamper.Writer.CloseStream = false;
        pdfStamper.Close();

        FileStream fs = File.OpenWrite(string.Format(@"{0}/{1}-TracePdfFields_{2}.pdf", 
            ConfigManager.GetInstance().LogConfig.Dir, 
            new FileInfo(pdfFilePath).Name, 
            DateTime.Now.Ticks));

        fs.Write(pdfStream.ToArray(), 0, (int)pdfStream.Length);
        fs.Flush();
        fs.Close();
    }