Python 美汤有没有办法统计一个html页面中的标签数量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13853025/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:44:55  来源:igfitidea点击:

Is there a way in beautiful soup to count the number of tags in a html page

pythonbeautifulsoup

提问by gizgok

I'm looking at creating a dictionary in python where the key is the html tag name and the value is the number of times the tag appeared. Is there a way to do this with beautiful soup or something else?

我正在考虑在 python 中创建一个字典,其中键是 html 标签名称,值是标签出现的次数。有没有办法用漂亮的汤或其他东西来做到这一点?

采纳答案by Anonymous Coward

With BeautifulSoup you can search for all tags by omitting the search criteria:

使用 BeautifulSoup,您可以通过省略搜索条件来搜索所有标签:

# print all tags
for tag in soup.findAll():
    print tag.name # TODO: add/update dict

If you're only interested in the number of occurrences, BeautifulSoup may be a bit overkill in which case you could use the HTMLParserinstead:

如果您只对出现次数感兴趣,BeautifulSoup 可能有点矫枉过正,在这种情况下,您可以HTMLParser改用:

from HTMLParser import HTMLParser

class print_tags(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print tag # TODO: add/update dict

parser = print_tags()
parser.feed(html)

This will produce the same output.

这将产生相同的输出。

To create the dictionary of { 'tag' : count }you could use collections.defaultdict:

要创建{ 'tag' : count }您可以使用的字典collections.defaultdict

from collections import defaultdict

occurrences = defaultdict(int)
# ...
occurrences[tag_name] += 1

回答by jdotjdot

BeautifulSoup is really good for HTML parsing, and you could certainly use it for this purpose. It would be extremely simple:

BeautifulSoup 非常适合 HTML 解析,您当然可以将它用于此目的。这将非常简单:

from bs4 import BeautifulSoup as BS

def num_apperances_of_tag(tag_name, html):
    soup = BS(html)
    return len(soup.find_all(tag_name))