如何以编程方式将 HTML 转换为 epub?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3454894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to programmatically convert HTML to epub?
提问by Juanjo Conti
Can I do this conversion with any programming language or library?
我可以使用任何编程语言或库进行这种转换吗?
回答by eb1
The short answer is yes, it can be done in any programming language.
简短的回答是肯定的,它可以用任何编程语言完成。
Basic steps:
基本步骤:
- Convert your HTML to XHTML (+ CSS). This can be done in your program or through an XSLT file.
- Copy your files (XHTML, CSS, any images and fonts) into a directory structure that follows the format.
- Zip the directory structure up and name the archive with a ".epub" extension.
- 将您的 HTML 转换为 XHTML (+ CSS)。这可以在您的程序中或通过 XSLT 文件完成。
- 将您的文件(XHTML、CSS、任何图像和字体)复制到遵循该格式的目录结构中。
- 压缩目录结构并使用“.epub”扩展名命名存档。
Some web sites to help you get started:
一些帮助您入门的网站:
- A good tutorial for what's in an epub file (and how to create one yourself) can be found here: http://www.jedisaber.com/eBooks/Introduction.shtml. I used this to get started myself.
- Specs for the .epub standard are here: http://www.idpf.org/
- A validator for .epubs can be downloaded from here: https://github.com/IDPF/epubcheck
- 可以在此处找到有关 epub 文件中的内容(以及如何自己创建)的好教程:http: //www.jedisaber.com/eBooks/Introduction.shtml。我用这个来开始自己。
- .epub 标准的规范在这里:http: //www.idpf.org/
- .epubs 的验证器可以从这里下载:https: //github.com/IDPF/epubcheck
June 2015 Note:The epubcheck validator has moved from google code to GitHub; note the new URL.
2015 年 6 月 注:epubcheck 验证器已从 google 代码移至 GitHub;注意新的 URL。
回答by Alex Martelli
Calibresupports a wide variety of input formats, including HTML, and a wide variety of output formats, including EPUB, but it's not "a programming language or library". Are there specific reasons you desire a programming-based approach rather than a free-standing tool? If so, maybe Python and ebookmaker.py, for example, could help you.
Calibre支持多种输入格式,包括 HTML,以及多种输出格式,包括 EPUB,但它不是“一种编程语言或库”。您是否需要基于编程的方法而不是独立工具的具体原因?如果是这样,也许 Python 和ebookmaker.py,例如,可以帮助你。
回答by Shlomi Fish
A late reply, but I found the Python 3-based ebookmakerto be of value, at least after I contributed a pull request to remove a UTF-8 BOM. One problem with it appears to be that it uses brittle regular expressions to parse HTML, but I guess I'll have to report it there.
一个迟到的回复,但我发现基于 Python 3 的电子书制作器很有价值,至少在我贡献了删除 UTF-8 BOM 的拉取请求之后。它的一个问题似乎是它使用脆弱的正则表达式来解析 HTML,但我想我必须在那里报告它。
回答by skreutzer
I just started to implement such a tool in Java (OpenJDK compatible): html2epub. In order to get rid of manually editing the config file, I'll probably start a separate tool to generate the config file from any given directory (however, it would still be necessary to determine the order of the XHTMLs in the EPUB - for non-programmatical use, developing a GUI helper tool could be considered, for a fully flexible programmatical solution, I haven't come up with an idea yet). Before that, I implemented shell script based converters for custom XML input (hag2epub tools) - in case you're interested, I would probably port them to XHTML input (with a config file for the EPUB metadata or obtaining metadata from the topmost index.html of a directory, if existing).
我刚刚开始在 Java(兼容 OpenJDK)中实现这样一个工具:html2epub。为了摆脱手动编辑配置文件,我可能会启动一个单独的工具来从任何给定目录生成配置文件(但是,仍然需要确定 EPUB 中 XHTML 的顺序 - 对于非- 程序化使用,可以考虑开发一个GUI辅助工具,对于完全灵活的程序化解决方案,我还没有想出主意)。在此之前,我为自定义 XML 输入(hag2epub 工具)实现了基于 shell 脚本的转换器- 如果您感兴趣,我可能会将它们移植到 XHTML 输入(带有 EPUB 元数据的配置文件或从最顶层的 index.html 获取元数据)。 html 目录(如果存在)。
回答by cofiem
Here's pdf to epub, I know that's not what you're after, but it's a start.
这是pdf 到 epub,我知道这不是您想要的,但这是一个开始。
The calibrepackage may have what you want
该口径的包可能有你想要的
回答by Brian Singh
I am using the following library from Aspose - http://www.aspose.com/categories/.net-components/aspose.words-for-.net/default.aspx
我正在使用来自 Aspose 的以下库 - http://www.aspose.com/categories/.net-components/aspose.words-for-.net/default.aspx
In just two lines of code I am able to do html to epub conversions. Using this currently in a production system.
只需两行代码,我就可以将 html 转换为 epub。目前在生产系统中使用它。
Document doc = new Document(_sourceFilePath);
文档文档 = 新文档(_sourceFilePath);
doc.Save(_destinationFilePath, SaveFormat.Epub);
doc.Save(_destinationFilePath, SaveFormat.Epub);
回答by user81718
I have the same issue previously, necause I want to read some webpage content offline on my iPad. I have no idea and I am not a computer savvy. There are calibre or stanza blabla....
我以前也有同样的问题,因为我想在 iPad 上离线阅读一些网页内容。我不知道,我也不是一个精通计算机的人。有口径或节 blabla....
But for me they are just formats converters and I need a ePub book creator which will allows me to combine many desired documents together to read. Then I found a bookish html to ePub converter, I save the html page from web then convert with it. It's a quite good tool for me now.
但对我来说,它们只是格式转换器,我需要一个 ePub 图书创建器,它可以让我将许多所需的文档组合在一起进行阅读。然后我找到了一个书本的html to ePub 转换器,我从 web 保存了 html 页面,然后用它进行了转换。它现在对我来说是一个很好的工具。