Python Beautiful Soup 使用正则表达式查找标签？

Question

提问by user3314418

I'd really like to be able to allow Beautiful Soup to match any list of tags, like so. I know attr accepts regex, but is there anything in beautiful soup that allows you to do so?

我真的很希望能够让 Beautiful Soup 匹配任何标签列表，就像这样。我知道 attr 接受正则表达式，但是美丽的汤中有什么东西可以让你这样做吗？

soup.findAll("(a|div)")

Output:

输出：

<a> ASDFS
<div> asdfasdf
<a> asdfsdf

My goal is to create a scraper that can grab tables from sites. Sometimes tags are named inconsistently, and I'd like to be able to input a list of tags to name the 'data' part of a table.

我的目标是创建一个可以从站点抓取表格的抓取工具。有时标签的命名不一致，我希望能够输入标签列表来命名表的“数据”部分。

Answer 1

采纳答案by hwnd

find_all()is the most favored method in the Beautiful Soup search API.

find_all()是 Beautiful Soup 搜索 API 中最受欢迎的方法。

You can pass a variation of filters. Also, pass a listto find multiple tags:

您可以传递各种过滤器。另外，传递一个列表来查找多个标签：

>>> soup.find_all(['a', 'div'])

Example:

示例：

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<html><body><div>asdfasdf</div><p><a>foo</a></p></body></html>')
>>> soup.find_all(['a', 'div'])
[<div>asdfasdf</div>, <a>foo</a>]

Or you can use a regular expressionto find tags that contain aor div:

或者您可以使用正则表达式来查找包含a或的标签div：

>>> import re
>>> soup.find_all(re.compile("(a|div)"))

Answer 2

回答by ZJS

yes see docs...

是的，请参阅文档...

http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html

import re

soup.findAll(re.compile("^a$|(div)"))

Answer 3

回答by Manu CJ

Note that you can also use regular expressions to search in attributes of tags. For example:

请注意，您还可以使用正则表达式来搜索标签的属性。例如：

import re
from bs4 import BeautifulSoup

soup.find_all('a', {'href': re.compile(r'crummy\.com/')})

This example finds all <a>tags that link to a website containing the substring 'crummy.com'.

此示例查找<a>链接到包含子字符串的网站的所有标签'crummy.com'。

(I know this is a very old post, but hopefully someone will find this additional information useful.)

（我知道这是一篇很老的帖子，但希望有人会发现这些附加信息很有用。）

Python Beautiful Soup 使用正则表达式查找标签？

提问by user3314418

采纳答案by hwnd

回答by ZJS

回答by Manu CJ

相关推荐

最近更新

标签

Python Beautiful Soup 使用正则表达式查找标签？

提问by user3314418

采纳答案by hwnd

回答by ZJS

回答by Manu CJ

相关推荐

使用 Jinja 将数据作为 JSON 对象从 Python 发送到 Javascript

Python PySpark reduceByKey？添加键/元组

Python在同一文件夹中找不到模块

Python Pandas 使用 groupby 中的计数创建新列

相关推荐

最近更新

标签