Linux 从pdf文件中提取矢量图像
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9903880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
extract vector image from a pdf file
提问by v923z
Is there a command line tool on linux that would extract figures from a pdf file, and save them in vector format? I know about pdfimages, but that would create a bitmap, and that is not what I need.
linux 上是否有命令行工具可以从 pdf 文件中提取数字,并将它们保存为矢量格式?我知道 pdfimages,但这会创建一个位图,这不是我需要的。
回答by Dingo
not for imagesonly, as you seem to need, but
不仅用于图像,正如您似乎需要的那样,但是
- pdftocairo
- pdftocairo
http://poppler.freedesktop.org/
http://poppler.freedesktop.org/
http://www.manpagez.com/man/1/pdftocairo/(manpage)
http://www.manpagez.com/man/1/pdftocairo/ (手册页)
is able to render a pdf page to other vector formats like PS/EPS/SVG
能够将 pdf 页面呈现为其他矢量格式,如PS/EPS/SVG
assuming you have a pdf page with vectorized images, you can render this page to svg and then copy only image you are interested in
假设您有一个带有矢量化图像的 pdf 页面,您可以将此页面渲染为 svg,然后仅复制您感兴趣的图像
note: pdftocairocannot render multipage pdf to multipage svg
注意:pdftocairo无法将多页 pdf 渲染为多页 svg
if you need to convert to svg several pdf pages you need first to pick this page range and then burst pdf pages into single pdf pages
如果您需要将多个 pdf 页面转换为 svg,您首先需要选择此页面范围,然后将 pdf 页面分解为单个 pdf 页面
example (if we need to convert pages 1-10 of a pdf file to svg)
示例(如果我们需要将 pdf 文件的第 1-10 页转换为 svg)
- 1°
- 1°
pdftk file.pdf cat 1-10 output 1-10.pdf
pdftk file.pdf cat 1-10 output 1-10.pdf
- 2°
- 2°
pdftk 1-10.pdf burst
pdftk 1-10.pdf burst
- 3°
- 3°
for f in *.pdf; do pdftocairo -svg $f; done
for f in *.pdf; do pdftocairo -svg $f; done
- 4°
- 4°
finally, with sodipodi or inkscape, you can extract images you are interested from svg rendered pdf page
最后,使用 sodipodi 或 inkscape,您可以从 svg 渲染的 pdf 页面中提取您感兴趣的图像
回答by Falko Menge
This articledescribes the tools gpdfx, inkscape and pdf2svg which are not completely commandline-based, but still sound helpful.
本文介绍了 gpdfx、inkscape 和 pdf2svg 工具,它们并非完全基于命令行,但听起来仍然很有帮助。
回答by David van Driessche
What do you consider a "figure"? This is a concept that doesn't exist in PDF. The reason there are so many tools that can extract images from a PDF file, is because images are a very clearly identified entity.
你认为什么是“形象”?这是 PDF 中不存在的概念。之所以有这么多工具可以从 PDF 文件中提取图像,是因为图像是一个非常明确的实体。
Your "figures" however, are much less clearly defined. PDF files may contain lots of vector content that you wouldn't call a figure. Text can be stroked for example, which would make it vector art and as such it might be confused with your figures. Other decorative elements may be used in the background of the pages. Text may be underlined, which would be a vector element...
然而,你的“数字”定义得不太清楚。PDF 文件可能包含许多您不会称之为图形的矢量内容。例如,可以对文本进行描边,这将使其成为矢量艺术,因此它可能会与您的图形混淆。其他装饰元素可用于页面背景。文本可能带有下划线,这将是一个向量元素......
In the other direction, your "figure" may contain a caption that is text, further complicating things.
另一方面,您的“图形”可能包含一个文本标题,使事情进一步复杂化。
As PDF doesn't have the notion of a figure, you'll have to figure out how to isolate one on a PDF page (perhaps because the creator application always adds metadata to them, or because they use a special color or... If you can isolate them, it should be possible to trim everything irrelevant on the page and export what you need as EPS or SVG using some of the techniques described in the other answer.
由于 PDF 没有图形的概念,您必须弄清楚如何在 PDF 页面上隔离一个(可能是因为创建者应用程序总是向它们添加元数据,或者因为它们使用特殊颜色或...如果您可以隔离它们,则应该可以修剪页面上不相关的所有内容,并使用其他答案中描述的一些技术将您需要的内容导出为 EPS 或 SVG。