如何使用 iText java 读取 PDF 中的表格?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10878695/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 02:56:49  来源:igfitidea点击:

How to read a Table in a PDF using iText java?

javapdfitext

提问by soumitra chatterjee

I dont have much idea on pdf processing using java.I want to read a table in a PDF file using the iText java library. How to proceed?

我对使用 java 处理 pdf 没有太多想法。我想使用 iText java 库读取 PDF 文件中的表格。如何进行?

回答by Franz Ebner

You can extract text from a content stream, but for ordinary PDFs, the result will be plain text (without any structure). If there's a table on the page, that table won't be recognized as such. You'll get the content and some white space, but that's not a tabular structure! Only if you have a tagged PDF, you can obtain an XML-file. If the PDF contains tags that are recognized as table tags, this will be reflected in the PDF.

您可以从内容流中提取文本,但对于普通 PDF,结果将是纯文本(没有任何结构)。如果页面上有表格,则该表格将不会被识别。您将获得内容和一些空白,但这不是表格结构!只有当你有一个带标签的 PDF 时,你才能获得一个 XML 文件。如果 PDF 包含被识别为表格标签的标签,这将反映在 PDF 中。

That's what I found out here

这就是我在这里发现的

回答by Abhishek Yadav

For reading content of the table from a PDF file, you only have to convert the PDF into a text file by using any API (I have used PdfTextExtracter.getTextFromPage()of iText) and then read that txt file by your Java program. After reading it the major task is done. You have to filter the data that you need, which you can do by continuously using split method of Stringclass until you find the record you want.

要从 PDF 文件中读取表格内容,您只需使用任何 API(我使用PdfTextExtracter.getTextFromPage()过 iText)将 PDF 转换为文本文件,然后通过您的 Java 程序读取该 txt 文件。读完之后,主要任务就完成了。你必须过滤你需要的数据,你可以通过不断使用String类的split 方法来完成,直到找到你想要的记录。

Below is my code in which I have extracted part of the record from a PDF file and written it into a .CSV file. You can view the PDF file here: http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf

下面是我的代码,其中我从 PDF 文件中提取了部分记录并将其写入 .CSV 文件。您可以在此处查看 PDF 文件:http: //www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf

public static void genrateCsvMonth_Region(String pdfpath, String csvpath) {
        try {
            String line = null;
            // Appending Header in CSV file...
            BufferedWriter writer1 = new BufferedWriter(new FileWriter(csvpath,
                    true));
            writer1.close();
            // Checking whether file is empty or not..
            BufferedReader br = new BufferedReader(new FileReader(csvpath));
                         if ((line = br.readLine()) == null) {
                BufferedWriter writer = new BufferedWriter(new FileWriter(
                        csvpath, true));
                writer.append("REGION,");
                writer.append("YEAR,");
                writer.append("MONTH,");
                writer.append("THERMAL,");
                writer.append("NUCLEAR,");
                writer.append("HYDRO,");
                writer.append("TOTAL\n");
                writer.close();
            }
            // Reading the pdf file..
            PdfReader reader = new PdfReader(pdfpath);
            BufferedWriter writer = new BufferedWriter(new FileWriter(csvpath,
                    true));

            // Extracting records from page into String..
            String page = PdfTextExtractor.getTextFromPage(reader, 1);
            // Extracting month and Year from String..
            String period1[] = page.split("PEROID");
            String period2[] = period1[0].split(":");
            String month[] = period2[1].split("-");
            String period3[] = month[1].split("ENERGY");
            String year[] = period3[0].split("VIS");

            // Extracting Northen region
            String northen[] = page.split("NORTHEN REGION");
            String nthermal1[] = northen[0].split("THERMAL");
            String nthermal2[] = nthermal1[1].split(" ");

            String nnuclear1[] = northen[0].split("NUCLEAR");
            String nnuclear2[] = nnuclear1[1].split(" ");

            String nhydro1[] = northen[0].split("HYDRO");
            String nhydro2[] = nhydro1[1].split(" ");

            String ntotal1[] = northen[0].split("TOTAL");
            String ntotal2[] = ntotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("NORTHEN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(nthermal2[4] + ",");
            writer.append(nnuclear2[4] + ",");
            writer.append(nhydro2[4] + ",");
            writer.append(ntotal2[4] + "\n");

            // Extracting Western region
            String western[] = page.split("WESTERN");

            String wthermal1[] = western[1].split("THERMAL");
            String wthermal2[] = wthermal1[1].split(" ");

            String wnuclear1[] = western[1].split("NUCLEAR");
            String wnuclear2[] = wnuclear1[1].split(" ");

            String whydro1[] = western[1].split("HYDRO");
            String whydro2[] = whydro1[1].split(" ");

            String wtotal1[] = western[1].split("TOTAL");
            String wtotal2[] = wtotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("WESTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(wthermal2[4] + ",");
            writer.append(wnuclear2[4] + ",");
            writer.append(whydro2[4] + ",");
            writer.append(wtotal2[4] + "\n");

            // Extracting Southern Region
            String southern[] = page.split("SOUTHERN");

            String sthermal1[] = southern[1].split("THERMAL");
            String sthermal2[] = sthermal1[1].split(" ");

            String snuclear1[] = southern[1].split("NUCLEAR");
            String snuclear2[] = snuclear1[1].split(" ");

            String shydro1[] = southern[1].split("HYDRO");
            String shydro2[] = shydro1[1].split(" ");

            String stotal1[] = southern[1].split("TOTAL");
            String stotal2[] = stotal1[1].split(" ");

            // Appending filtered data into CSV file..
            writer.append("SOUTHERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(sthermal2[4] + ",");
            writer.append(snuclear2[4] + ",");
            writer.append(shydro2[4] + ",");
            writer.append(stotal2[4] + "\n");

            // Extracting eastern region
            String eastern[] = page.split("EASTERN");

            String ethermal1[] = eastern[1].split("THERMAL");
            String ethermal2[] = ethermal1[1].split(" ");

            String ehydro1[] = eastern[1].split("HYDRO");
            String ehydro2[] = ehydro1[1].split(" ");

            String etotal1[] = eastern[1].split("TOTAL");
            String etotal2[] = etotal1[1].split(" ");
            // Appending filtered data into CSV file..
            writer.append("EASTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(ethermal2[4] + ",");
            writer.append(" " + ",");
            writer.append(ehydro2[4] + ",");
            writer.append(etotal2[4] + "\n");

            // Extracting northernEastern region
            String neestern[] = page.split("NORTH");

            String nethermal1[] = neestern[2].split("THERMAL");
            String nethermal2[] = nethermal1[1].split(" ");

            String nehydro1[] = neestern[2].split("HYDRO");
            String nehydro2[] = nehydro1[1].split(" ");

            String netotal1[] = neestern[2].split("TOTAL");
            String netotal2[] = netotal1[1].split(" ");

            writer.append("NORTH EASTERN" + ",");
            writer.append(year[0] + ",");
            writer.append(month[0] + ",");
            writer.append(nethermal2[4] + ",");
            writer.append(" " + ",");
            writer.append(nehydro2[4] + ",");
            writer.append(netotal2[4] + "\n");
            writer.close();

        } catch (IOException ioe) {
            ioe.printStackTrace();
        }

    }

回答by vijaykamma

My Solution

我的解决方案

package com.geek.tutorial.itext.table;
import java.io.FileOutputStream;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfPCell;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;

public class SimplePDFTable
{
    public SimplePDFTable() throws Exception
    {
        Document document = new Document();
        PdfWriter.getInstance(document, 
            new FileOutputStream("SimplePDFTable.pdf"));
        document.open();
        PdfPTable table = new PdfPTable(2); // Code 1
        // Code 2
        table.addCell("1");
        table.addCell("2");
        // Code 3
        table.addCell("3");
        table.addCell("4");
        // Code 4
        table.addCell("5");
        table.addCell("6");
        // Code 5
        document.add(table);        
        document.close();
    }

    public static void main(String[] args)
    {    
        try
        {
            SimplePDFTable pdfTable = new SimplePDFTable();
        }
        catch(Exception e)
        {
            System.out.println(e);
        }
    }
}