Python 相当于 PHP 的 strip_tags 吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2295942/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-04 00:18:00  来源:igfitidea点击:

Python's equivalent to PHP's strip_tags?

phppythonstrip

提问by Viet

Python's equivalent to PHP's strip_tags?

Python 相当于 PHP 的 strip_tags 吗?

http://php.net/manual/en/function.strip-tags.php

http://php.net/manual/en/function.strip-tags.php

回答by e-satis

There is no such thing in the Python standard library.It's because Python is a general purpose language while PHP started as a Web oriented language.

Python 标准库中没有这样的东西。这是因为 Python 是一种通用语言,而 PHP 最初是一种面向 Web 的语言。

Nevertheless, you have 3 solutions:

不过,您有 3 个解决方案:

  • You are in a hurry: just make your own. re.sub(r'<[^>]*?>', '', value)can be a quick and dirty solution.
  • Use a third party library (recommended because more bullet proof) : beautiful soupis a really good one and there is nothing to install, just copy the lib dir and import. Full tuto with beautiful soup.
  • Use a framework. Most Web Python devs never code from scratch, they use a framework such as djangothat does automatically this stuff for you. Full tuto with django.
  • 你很着急:只做你自己的。re.sub(r'<[^>]*?>', '', value)可能是一个快速而肮脏的解决方案。
  • 使用第三方库(推荐,因为更防弹):beautiful Soup是一个非常好的库,无需安装,只需复制 lib 目录并导入即可。完整的汤和美丽的汤
  • 使用框架。大多数 Web Python 开发人员从不从头开始编写代码,他们使用django等框架自动为您完成这些工作。django 的完整教程

回答by John La Rooy

Using BeautifulSoup

使用BeautifulSoup

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(htmltext)
''.join([e for e in soup.recursiveChildGenerator() if isinstance(e,unicode)])

回答by GoTLiuM

from bleach import clean
print clean("<strong>My Text</strong>", tags=[], strip=True, strip_comments=True)

回答by Otto Allmendinger

You won't find many builtin Python equivalents for builtin PHP HTML functions since Python is more of a general-purpose scripting language than a web development language. For HTML processing, BeautifulSoupis generally recommended.

您不会找到许多内置 PHP HTML 函数的内置 Python 等价物,因为 Python 与其说是 Web 开发语言,不如说是一种通用脚本语言。对于 HTML 处理,一般推荐使用BeautifulSoup

回答by cwallenpoole

I built one for Python 3 using the HTMLParser class. It is more verbose than PHP's. I called it the HTMLCleaner class, and you can find the source hereand you can find examples here.

我使用 HTMLParser 类为 Python 3 构建了一个。它比 PHP 更冗长。我称它为 HTMLCleaner 类,你可以在这里找到源代码,你可以在这里找到示例。

回答by Gagandeep Singh

There is an active state recipe for this,

对此有一个活动状态配方,

http://code.activestate.com/recipes/52281/

http://code.activestate.com/recipes/52281/

It's old code so you have to change sgml parser to HTMLparser as mentioned in the comments

这是旧代码,因此您必须如注释中所述将 sgml 解析器更改为 HTMLparser

Here is the modified code,

这是修改后的代码,

import HTMLParser, string

class StrippingParser(HTMLParser.HTMLParser):

    # These are the HTML tags that we will leave intact
    valid_tags = ('b', 'a', 'i', 'br', 'p', 'img')

    from htmlentitydefs import entitydefs # replace entitydefs from sgmllib

    def __init__(self):
        HTMLParser.HTMLParser.__init__(self)
        self.result = ""
        self.endTagList = []

    def handle_data(self, data):
        if data:
            self.result = self.result + data

    def handle_charref(self, name):
        self.result = "%s&#%s;" % (self.result, name)

    def handle_entityref(self, name):
        if self.entitydefs.has_key(name): 
            x = ';'
        else:
            # this breaks unstandard entities that end with ';'
            x = ''
        self.result = "%s&%s%s" % (self.result, name, x)

    def handle_starttag(self, tag, attrs):
        """ Delete all tags except for legal ones """
        if tag in self.valid_tags:       
            self.result = self.result + '<' + tag
            for k, v in attrs:
                if string.lower(k[0:2]) != 'on' and string.lower(v[0:10]) != 'javascript':
                    self.result = '%s %s="%s"' % (self.result, k, v)
            endTag = '</%s>' % tag
            self.endTagList.insert(0,endTag)    
            self.result = self.result + '>'

    def handle_endtag(self, tag):
        if tag in self.valid_tags:
            self.result = "%s</%s>" % (self.result, tag)
            remTag = '</%s>' % tag
            self.endTagList.remove(remTag)

    def cleanup(self):
        """ Append missing closing tags """
        for j in range(len(self.endTagList)):
                self.result = self.result + self.endTagList[j]    


def strip(s):
    """ Strip illegal HTML tags from string s """
    parser = StrippingParser()
    parser.feed(s)
    parser.close()
    parser.cleanup()
    return parser.result

回答by Ignacio Vazquez-Abrams

Python doesn't have one built-in, but there are an ungodly number of implementations.

Python 没有一个内置的,但有大量的实现