Python：如何将 Markdown 格式的文本转换为文本

Question

提问by Krish

I need to convert markdown text to plain text format to display summary in my website. I want the code in python.

我需要将 Markdown 文本转换为纯文本格式以在我的网站中显示摘要。我想要python中的代码。

Answer 1

回答by Jason Coon

This module will help do what you describe:

该模块将帮助您完成您的描述：

http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module

Once you have converted the markdown to HTML, you can use a HTML parser to strip out the plain text.

将 Markdown 转换为 HTML 后，您可以使用 HTML 解析器去除纯文本。

Your code might look something like this:

您的代码可能如下所示：

from BeautifulSoup import BeautifulSoup
from markdown import markdown

html = markdown(some_html_string)
text = ''.join(BeautifulSoup(html).findAll(text=True))

Answer 2

回答by Pavel Vorobyov

Despite the fact that this is a very old question, I'd like to suggest a solution I came up with recently. This one neither uses BeautifulSoup nor has an overhead of converting to html and back.

尽管这是一个非常古老的问题，但我还是想提出一个我最近想出的解决方案。这个既不使用 BeautifulSoup 也没有转换为 html 和返回的开销。

The markdownmodule core class Markdown has a property output_formatswhich is not configurable but otherwise patchable like almost anything in python is. This property is a dict mapping output format name to a rendering function. By default it has two output formats, 'html' and 'xhtml' correspondingly. With a little help it may have a plaintext rendering function which is easy to write:

该降价模块核心类降价有一个属性output_formats这是不是配置的，但以其他方式可修补像蟒蛇几乎所有的东西是。此属性是将输出格式名称映射到渲染函数的字典。默认情况下，它有两种输出格式，分别是 'html' 和 'xhtml'。借助一点帮助，它可能具有易于编写的纯文本渲染功能：

from markdown import Markdown
from io import StringIO


def unmark_element(element, stream=None):
    if stream is None:
        stream = StringIO()
    if element.text:
        stream.write(element.text)
    for sub in element:
        unmark_element(sub, stream)
    if element.tail:
        stream.write(element.tail)
    return stream.getvalue()


# patching Markdown
Markdown.output_formats["plain"] = unmark_element
__md = Markdown(output_format="plain")
__md.stripTopLevelTags = False


def unmark(text):
    return __md.convert(text)

unmarkfunction takes markdown text as an input and returns all the markdown characters stripped out.

unmark函数将 Markdown 文本作为输入并返回所有被剥离的 Markdown 字符。

Answer 3

回答by Rob

Commented and removed it because I finally think I see the rub here: It may be easier to convert your markdown text to HTML and remove HTML from the text. I'm not aware of anything to remove markdown from text effectively but there are many HTML to plain text solutions.

评论并删除它，因为我终于觉得我在这里看到了问题：将 Markdown 文本转换为 HTML 并从文本中删除 HTML 可能更容易。我不知道有什么可以有效地从文本中删除降价，但是有很多 HTML 到纯文本的解决方案。

Python：如何将 Markdown 格式的文本转换为文本

提问by Krish

回答by Jason Coon

回答by Pavel Vorobyov

回答by Rob

相关推荐

最近更新

标签

Python：如何将 Markdown 格式的文本转换为文本

提问by Krish

回答by Jason Coon

回答by Pavel Vorobyov

回答by Rob

相关推荐

python 如何使用服务器端脚本生成网页的屏幕截图？

在 Python 中为基本图像文件 I/O 和处理寻找 PIL 的更好替代方案？

python 如何遍历命令行上传递的所有文件行？

使用 Python 查找文本中的超链接（与 Twitter 相关）

相关推荐

最近更新

标签