Python 相当于 PHP 的 strip_tags 吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2295942/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python's equivalent to PHP's strip_tags?
提问by Viet
Python's equivalent to PHP's strip_tags?
Python 相当于 PHP 的 strip_tags 吗?
回答by e-satis
There is no such thing in the Python standard library.It's because Python is a general purpose language while PHP started as a Web oriented language.
Python 标准库中没有这样的东西。这是因为 Python 是一种通用语言,而 PHP 最初是一种面向 Web 的语言。
Nevertheless, you have 3 solutions:
不过,您有 3 个解决方案:
- You are in a hurry: just make your own.
re.sub(r'<[^>]*?>', '', value)
can be a quick and dirty solution. - Use a third party library (recommended because more bullet proof) : beautiful soupis a really good one and there is nothing to install, just copy the lib dir and import. Full tuto with beautiful soup.
- Use a framework. Most Web Python devs never code from scratch, they use a framework such as djangothat does automatically this stuff for you. Full tuto with django.
- 你很着急:只做你自己的。
re.sub(r'<[^>]*?>', '', value)
可能是一个快速而肮脏的解决方案。 - 使用第三方库(推荐,因为更防弹):beautiful Soup是一个非常好的库,无需安装,只需复制 lib 目录并导入即可。完整的汤和美丽的汤。
- 使用框架。大多数 Web Python 开发人员从不从头开始编写代码,他们使用django等框架自动为您完成这些工作。django 的完整教程。
回答by John La Rooy
Using BeautifulSoup
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(htmltext)
''.join([e for e in soup.recursiveChildGenerator() if isinstance(e,unicode)])
回答by GoTLiuM
from bleach import clean
print clean("<strong>My Text</strong>", tags=[], strip=True, strip_comments=True)
回答by Otto Allmendinger
You won't find many builtin Python equivalents for builtin PHP HTML functions since Python is more of a general-purpose scripting language than a web development language. For HTML processing, BeautifulSoupis generally recommended.
您不会找到许多内置 PHP HTML 函数的内置 Python 等价物,因为 Python 与其说是 Web 开发语言,不如说是一种通用脚本语言。对于 HTML 处理,一般推荐使用BeautifulSoup。
回答by cwallenpoole
回答by Gagandeep Singh
There is an active state recipe for this,
对此有一个活动状态配方,
http://code.activestate.com/recipes/52281/
http://code.activestate.com/recipes/52281/
It's old code so you have to change sgml parser to HTMLparser as mentioned in the comments
这是旧代码,因此您必须如注释中所述将 sgml 解析器更改为 HTMLparser
Here is the modified code,
这是修改后的代码,
import HTMLParser, string
class StrippingParser(HTMLParser.HTMLParser):
# These are the HTML tags that we will leave intact
valid_tags = ('b', 'a', 'i', 'br', 'p', 'img')
from htmlentitydefs import entitydefs # replace entitydefs from sgmllib
def __init__(self):
HTMLParser.HTMLParser.__init__(self)
self.result = ""
self.endTagList = []
def handle_data(self, data):
if data:
self.result = self.result + data
def handle_charref(self, name):
self.result = "%s&#%s;" % (self.result, name)
def handle_entityref(self, name):
if self.entitydefs.has_key(name):
x = ';'
else:
# this breaks unstandard entities that end with ';'
x = ''
self.result = "%s&%s%s" % (self.result, name, x)
def handle_starttag(self, tag, attrs):
""" Delete all tags except for legal ones """
if tag in self.valid_tags:
self.result = self.result + '<' + tag
for k, v in attrs:
if string.lower(k[0:2]) != 'on' and string.lower(v[0:10]) != 'javascript':
self.result = '%s %s="%s"' % (self.result, k, v)
endTag = '</%s>' % tag
self.endTagList.insert(0,endTag)
self.result = self.result + '>'
def handle_endtag(self, tag):
if tag in self.valid_tags:
self.result = "%s</%s>" % (self.result, tag)
remTag = '</%s>' % tag
self.endTagList.remove(remTag)
def cleanup(self):
""" Append missing closing tags """
for j in range(len(self.endTagList)):
self.result = self.result + self.endTagList[j]
def strip(s):
""" Strip illegal HTML tags from string s """
parser = StrippingParser()
parser.feed(s)
parser.close()
parser.cleanup()
return parser.result
回答by Ignacio Vazquez-Abrams
Python doesn't have one built-in, but there are an ungodly number of implementations.
Python 没有一个内置的,但有大量的实现。