使用 Java 从 HTML 到 Markdown
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/59557/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
HTML to Markdown with Java
提问by Sergio del Amo
is there an easy way to transform HTML into markdown with JAVA?
有没有一种简单的方法可以使用 JAVA 将 HTML 转换为 Markdown?
I am currently using the Java MarkdownJlibrary to transform markdown to html.
我目前正在使用 Java MarkdownJ库将Markdown转换为 html。
import com.petebevin.markdown.MarkdownProcessor;
...
public static String getHTML(String markdown) {
MarkdownProcessor markdown_processor = new MarkdownProcessor();
return markdown_processor.markdown(markdown);
}
public static String getMarkdown(String html) {
/* TODO Ask stackoverflow */
}
回答by Marcio Aguiar
Use this XSLT.
使用这个XSLT。
If you need help using XSLT and Java here's a code snippet:
如果您在使用 XSLT 和 Java 方面需要帮助,这里有一个代码片段:
public static void main(String[] args) throws Exception {
File xsltFile = new File("mardownXSLT.xslt");
Source xmlSource = new StreamSource(new StringReader(theHTML));
Source xsltSource = new StreamSource(xsltFile);
TransformerFactory transFact =
TransformerFactory.newInstance();
Transformer trans = transFact.newTransformer(xsltSource);
StringWriter result = new StringWriter();
trans.transform(xmlSource, new StreamResult(result));
}
回答by Ruud
I came across Remark for converting HTML to Markdown see: http://remark.overzealous.com/manual/index.htmlIt depends on JSoup, a powerful Java library for working with real-world HTML.
我遇到了将 HTML 转换为 Markdown 的 Remark,请参见:http://remark.overzealous.com/manual/index.html 它依赖于 JSoup,这是一个强大的 Java 库,用于处理现实世界的 HTML。
回答by myabc
I am working on the same issue, and experimenting with a couple different techniques.
我正在研究同一个问题,并尝试了几种不同的技术。
The answer above could work. You could use the jTidy libraryto do the initial cleanup work and convert from HTML to XHTML. You use the XSLT stylesheetlinked above.
上面的答案可以工作。您可以使用jTidy 库进行初始清理工作并将 HTML 转换为 XHTML。您可以使用上面链接的XSLT 样式表。
Unfortunately there is no library that has a one-stop function to do this in Java. You could try using the Python script html2textwith Jython, but I haven't yet tried this!
不幸的是,在 Java 中没有具有一站式功能的库。您可以尝试在Jython 中使用 Python 脚本html2text,但我还没有尝试过!
回答by myabc
if you are using WMD editor and want to get the markdown code on the server side, just use these options before loading the wmd.jsscript:
如果您正在使用 WMD 编辑器并希望在服务器端获取 Markdown 代码,只需在加载wmd.js脚本之前使用这些选项:
wmd_options = {
// format sent to the server. can also be "HTML"
output: "Markdown",
// line wrapping length for lists, blockquotes, etc.
lineLength: 40,
// toolbar buttons. Undo and redo get appended automatically.
buttons: "bold italic | link blockquote code image | ol ul heading hr",
// option to automatically add WMD to the first textarea found.
autostart: true
};
回答by Gabriel Furstenheim
There is a great library for JS called Turndown, you can try it online here. It works for htmls that the accepted answer errors out.
有一个很棒的 JS 库叫做Turndown,你可以在这里在线试用。它适用于接受的答案出错的 html。
I needed it for Java (as the question), so I ported it. The library for Java is called CopyDown, it has the same test suite as Turndown and I've tried it with real examples that the accepted answer was throwing errors.
我需要它用于 Java(作为问题),所以我移植了它。Java 库称为CopyDown,它具有与 Turndown 相同的测试套件,我已经用真实的例子尝试过它,接受的答案是抛出错误。
To install with gradle:
使用 gradle 安装:
dependencies {
compile 'io.github.furstenheim:copy_down:1.0'
}
Then to use it:
然后使用它:
CopyDown converter = new CopyDown();
String myHtml = "<h1>Some title</h1><div>Some html<p>Another paragraph</p></div>";
String markdown = converter.convert(myHtml);
System.out.println(markdown);
> Some title\n==========\n\nSome html\n\nAnother paragraph\n
PS. It has MIT license
附注。它有 MIT 许可证

