如何以编程方式将 HTML 转换为 epub?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3454894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 04:04:44  来源:igfitidea点击:

How to programmatically convert HTML to epub?

htmlautomationepub

提问by Juanjo Conti

Can I do this conversion with any programming language or library?

我可以使用任何编程语言或库进行这种转换吗?

回答by eb1

The short answer is yes, it can be done in any programming language.

简短的回答是肯定的,它可以用任何编程语言完成。

Basic steps:

基本步骤:

  1. Convert your HTML to XHTML (+ CSS). This can be done in your program or through an XSLT file.
  2. Copy your files (XHTML, CSS, any images and fonts) into a directory structure that follows the format.
  3. Zip the directory structure up and name the archive with a ".epub" extension.
  1. 将您的 HTML 转换为 XHTML (+ CSS)。这可以在您的程序中或通过 XSLT 文件完成。
  2. 将您的文件(XHTML、CSS、任何图像和字体)复制到遵循该格式的目录结构中。
  3. 压缩目录结构并使用“.epub”扩展名命名存档。

Some web sites to help you get started:

一些帮助您入门的网站:

June 2015 Note:The epubcheck validator has moved from google code to GitHub; note the new URL.

2015 年 6 月 注:epubcheck 验证器已从 google 代码移至 GitHub;注意新的 URL。

回答by Alex Martelli

Calibresupports a wide variety of input formats, including HTML, and a wide variety of output formats, including EPUB, but it's not "a programming language or library". Are there specific reasons you desire a programming-based approach rather than a free-standing tool? If so, maybe Python and ebookmaker.py, for example, could help you.

Calibre支持多种输入格式,包括 HTML,以及多种输出格式,包括 EPUB,但它不是“一种编程语言或库”。您是否需要基于编程的方法而不是独立工具的具体原因?如果是这样,也许 Python 和ebookmaker.py,例如,可以帮助你。

回答by Shlomi Fish

A late reply, but I found the Python 3-based ebookmakerto be of value, at least after I contributed a pull request to remove a UTF-8 BOM. One problem with it appears to be that it uses brittle regular expressions to parse HTML, but I guess I'll have to report it there.

一个迟到的回复,但我发现基于 Python 3 的电子书制作器很有价值,至少在我贡献了删除 UTF-8 BOM 的拉取请求之后。它的一个问题似乎是它使用脆弱的正则表达式来解析 HTML,但我想我必须在那里报告它。

回答by skreutzer

I just started to implement such a tool in Java (OpenJDK compatible): html2epub. In order to get rid of manually editing the config file, I'll probably start a separate tool to generate the config file from any given directory (however, it would still be necessary to determine the order of the XHTMLs in the EPUB - for non-programmatical use, developing a GUI helper tool could be considered, for a fully flexible programmatical solution, I haven't come up with an idea yet). Before that, I implemented shell script based converters for custom XML input (hag2epub tools) - in case you're interested, I would probably port them to XHTML input (with a config file for the EPUB metadata or obtaining metadata from the topmost index.html of a directory, if existing).

我刚刚开始在 Java(兼容 OpenJDK)中实现这样一个工具:html2epub。为了摆脱手动编辑配置文件,我可能会启动一个单独的工具来从任何给定目录生成配置文件(但是,仍然需要确定 EPUB 中 XHTML 的顺序 - 对于非- 程序化使用,可以考虑开发一个GUI辅助工具,对于完全灵活的程序化解决方案,我还没有想出主意)。在此之前,我为自定义 XML 输入(hag2epub 工具)实现了基于 shell 脚本的转换器- 如果您感兴趣,我可能会将它们移植到 XHTML 输入(带有 EPUB 元数据的配置文件或从最顶层的 index.html 获取元数据)。 html 目录(如果存在)。

回答by cofiem

Here's pdf to epub, I know that's not what you're after, but it's a start.

这是pdf 到 epub,我知道这不是您想要的,但这是一个开始。

The calibrepackage may have what you want

口径的包可能有你想要的

回答by Brian Singh

I am using the following library from Aspose - http://www.aspose.com/categories/.net-components/aspose.words-for-.net/default.aspx

我正在使用来自 Aspose 的以下库 - http://www.aspose.com/categories/.net-components/aspose.words-for-.net/default.aspx

In just two lines of code I am able to do html to epub conversions. Using this currently in a production system.

只需两行代码,我就可以将 html 转换为 epub。目前在生产系统中使用它。

Document doc = new Document(_sourceFilePath);

文档文档 = 新文档(_sourceFilePath);

doc.Save(_destinationFilePath, SaveFormat.Epub);

doc.Save(_destinationFilePath, SaveFormat.Epub);

回答by user81718

I have the same issue previously, necause I want to read some webpage content offline on my iPad. I have no idea and I am not a computer savvy. There are calibre or stanza blabla....

我以前也有同样的问题,因为我想在 iPad 上离线阅读一些网页内容。我不知道,我也不是一个精通计算机的人。有口径或节 blabla....

But for me they are just formats converters and I need a ePub book creator which will allows me to combine many desired documents together to read. Then I found a bookish html to ePub converter, I save the html page from web then convert with it. It's a quite good tool for me now.

但对我来说,它们只是格式转换器,我需要一个 ePub 图书创建器,它可以让我将许多所需的文档组合在一起进行阅读。然后我找到了一个书本的html to ePub 转换器,我从 web 保存了 html 页面,然后用它进行了转换。它现在对我来说是一个很好的工具。