C# 解析pdf文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10437163/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parsing pdf files
提问by desi
I have a requirement to split a large pdf document into smaller files based on the content of the file. We use BCL easyPDF to manipulate pdf files. easyPDF can split pdf documents based on a page number, but it cannot split the document based on the file content. Also it does not have a search function (as far as I can tell, if I am wrong please someone let me know.) to determine the location of the content.
我需要根据文件的内容将大型 pdf 文档拆分为较小的文件。我们使用 BCL easyPDF 来操作 pdf 文件。easyPDF可以根据页码拆分pdf文档,但不能根据文件内容拆分文档。它也没有搜索功能(据我所知,如果我错了,请有人告诉我。)来确定内容的位置。
Now can someone tell me how I can find the location of text in a pdf file using .net?
现在有人可以告诉我如何使用 .net 在 pdf 文件中找到文本的位置吗?
Thanks
谢谢
回答by Brian
take a look at this question. there are links to some libraries that may satisfy your requirements
看看这个问题。有一些图书馆的链接可以满足您的要求
回答by Pablo Santa Cruz
You need a PDF library in .NET such as iText.Net.
您需要 .NET 中的 PDF 库,例如 iText.Net。
回答by Bobrovsky
You might try Docotic.Pdf libraryfor your task.
您可以尝试使用Docotic.Pdf 库来完成您的任务。
The library can retrieve a collection of words with their bounding rectangles from PDFs. This should help you to find location of the text in a file.
该库可以从 PDF检索带有边界矩形的单词集合。这应该可以帮助您找到文件中文本的位置。
The library could also be used to extract text (with or without formatting).
该库还可用于提取文本(带或不带格式)。
Disclaimer: I work for the vendor of the library.
免责声明:我为图书馆的供应商工作。

