在 Python 2.6 中用相应的 utf-8 字符替换 html 实体

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/730299/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:45:40  来源:igfitidea点击:

Replace html entities with the corresponding utf-8 characters in Python 2.6

pythonhtml-entitiespython-2.6

提问by Alexandru

I have a html text like this:

我有一个这样的 html 文本:

<xml ... >

and I want to convert it to something readable:

我想将其转换为可读的内容:

<xml ...>

Any easy (and fast) way to do it in Python?

有什么简单(快速)的方法可以在 Python 中做到这一点?

回答by vartec

Python 2.7

蟒蛇 2.7

Official documentation for HTMLParser: Python 2.7

官方文档HTMLParserPython 2.7

>>> import HTMLParser
>>> pars = HTMLParser.HTMLParser()
>>> pars.unescape('&copy; &euro;')
u'\xa9 \u20ac'
>>> print _
? 

Python 3

蟒蛇 3

Official documentation for HTMLParser: Python 3

官方文档HTMLParserPython 3

>>> from html.parser import HTMLParser
>>> pars = HTMLParser()
>>> pars.unescape('&copy; &euro;')
? 

回答by Benson

There is a function herethat does it, as linked from the post Fred pointed out. Copied here to make things easier.

正如 Fred 指出的那样,这里有一个函数可以做到这一点。复制到这里是为了让事情变得更容易。

Credit to Fred Larson for linking to the other question on SO. Credit to dF for posting the link.

感谢 Fred Larson 链接到关于 SO 的另一个问题。感谢 dF 发布链接。

回答by line break

Modern Python 3 approach:

现代 Python 3 方法:

>>> import html
>>> html.unescape('&copy; &euro;')
? 

https://docs.python.org/3/library/html.html

https://docs.python.org/3/library/html.html