C# 解析pdf文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10437163/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 13:51:51  来源:igfitidea点击:

Parsing pdf files

c#parsingpdfpdf-scraping

提问by desi

I have a requirement to split a large pdf document into smaller files based on the content of the file. We use BCL easyPDF to manipulate pdf files. easyPDF can split pdf documents based on a page number, but it cannot split the document based on the file content. Also it does not have a search function (as far as I can tell, if I am wrong please someone let me know.) to determine the location of the content.

我需要根据文件的内容将大型 pdf 文档拆分为较小的文件。我们使用 BCL easyPDF 来操作 pdf 文件。easyPDF可以根据页码拆分pdf文档,但不能根据文件内容拆分文档。它也没有搜索功能(据我所知,如果我错了,请有人告诉我。)来确定内容的位置。

Now can someone tell me how I can find the location of text in a pdf file using .net?

现在有人可以告诉我如何使用 .net 在 pdf 文件中找到文本的位置吗?

Thanks

谢谢

回答by Brian

take a look at this question. there are links to some libraries that may satisfy your requirements

看看这个问题。有一些图书馆的链接可以满足您的要求

How to programatically search a PDF document in c#

如何在c#中以编程方式搜索PDF文档

回答by Pablo Santa Cruz

You need a PDF library in .NET such as iText.Net.

您需要 .NET 中的 PDF 库,例如 iText.Net。

回答by Bobrovsky

You might try Docotic.Pdf libraryfor your task.

您可以尝试使用Docotic.Pdf 库来完成您的任务。

The library can retrieve a collection of words with their bounding rectangles from PDFs. This should help you to find location of the text in a file.

该库可以从 PDF检索带有边界矩形单词集合。这应该可以帮助您找到文件中文本的位置。

The library could also be used to extract text (with or without formatting).

该库还可用于提取文本(带或不带格式)

Disclaimer: I work for the vendor of the library.

免责声明:我为图书馆的供应商工作。