在 Python 2.6 中用相应的 utf-8 字符替换 html 实体

Question

提问by Alexandru

I have a html text like this:

我有一个这样的 html 文本：

&lt;xml ... &gt;

and I want to convert it to something readable:

我想将其转换为可读的内容：

<xml ...>

Any easy (and fast) way to do it in Python?

有什么简单（快速）的方法可以在 Python 中做到这一点？

Answer 1

回答by vartec

Python 2.7

蟒蛇 2.7

Official documentation for HTMLParser: Python 2.7

官方文档HTMLParser：Python 2.7

>>> import HTMLParser
>>> pars = HTMLParser.HTMLParser()
>>> pars.unescape('&copy; &euro;')
u'\xa9 \u20ac'
>>> print _
?

Python 3

蟒蛇 3

Official documentation for HTMLParser: Python 3

官方文档HTMLParser：Python 3

>>> from html.parser import HTMLParser
>>> pars = HTMLParser()
>>> pars.unescape('&copy; &euro;')
?

Answer 2

回答by Benson

There is a function herethat does it, as linked from the post Fred pointed out. Copied here to make things easier.

正如 Fred 指出的那样，这里有一个函数可以做到这一点。复制到这里是为了让事情变得更容易。

Credit to Fred Larson for linking to the other question on SO. Credit to dF for posting the link.

感谢 Fred Larson 链接到关于 SO 的另一个问题。感谢 dF 发布链接。

Answer 3

回答by line break

Modern Python 3 approach:

现代 Python 3 方法：

>>> import html
>>> html.unescape('&copy; &euro;')
?

https://docs.python.org/3/library/html.html

在 Python 2.6 中用相应的 utf-8 字符替换 html 实体

提问by Alexandru

回答by vartec

Python 2.7

蟒蛇 2.7

Python 3

蟒蛇 3

回答by Benson

回答by line break

相关推荐

最近更新

标签

在 Python 2.6 中用相应的 utf-8 字符替换 html 实体

提问by Alexandru

回答by vartec

Python 2.7

蟒蛇 2.7

Python 3

蟒蛇 3

回答by Benson

回答by line break

相关推荐

python 为什么 subprocess.Popen(...) 不总是返回？

什么是 LLVM 以及如何用 LLVM 替换 Python VM 将速度提高 5 倍？

python 在python中填充一个列表

Python：Python 列表是为 len() 计数还是为每次调用计数？

相关推荐

最近更新

标签