Python Beautifulsoup：解析html——获取href的一部分

Question

提问by

I'm trying to parse

我正在尝试解析

<td height="16" class="listtable_1"><a href="http://steamcommunity.com/profiles/76561198134729239" target="_blank">76561198134729239</a></td>

for the 76561198134729239. and I can't figure out how to do it. what I tried:

对于 76561198134729239. 我不知道该怎么做。我试过的：

import requests
from lxml import html
from bs4 import BeautifulSoup
r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154")
content = r.content
soup = BeautifulSoup(content, "html.parser")
element = soup.find("td", 
{
    "class":"listtable_1",
    "target":"_blank"
})
print(element.text)

Answer 1

采纳答案by Martin Evans

There are many such entries in that HTML. To get all of them you could use the following:

在那个 HTML 中有很多这样的条目。要获得所有这些，您可以使用以下方法：

import requests
from lxml import html
from bs4 import BeautifulSoup

r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154")
soup = BeautifulSoup(r.content, "html.parser")

for td in soup.findAll("td", class_="listtable_1"):
    for a in td.findAll("a", href=True, target="_blank"):
        print(a.text)

This would then return:

这将返回：

76561198143466239
76561198094114508
76561198053422590
76561198066478249
76561198107353289
76561198043513442
76561198128253254
76561198134729239
76561198003749039
76561198091968935
76561198071376804
76561198068375438
76561198039625269
76561198135115106
76561198096243060
76561198067255227
76561198036439360
76561198026089333
76561198126749681
76561198008927797
76561198091421170
76561198122328638
76561198104586244
76561198056032796
76561198059683068
76561197995961306
76561198102013044

Answer 2

回答by MYGz

"target":"_blank"is a class of anchor tag awithin the tdtag. It's not a class of tdtag.

"target":"_blank"是标签a内的一类锚td标签。它不是一类td标签。

You can get it like so:

你可以这样得到它：

from bs4 import BeautifulSoup

html="""
<td height="16" class="listtable_1">
    <a href="http://steamcommunity.com/profiles/76561198134729239" target="_blank">
        76561198134729239
    </a>
</td>"""

soup = BeautifulSoup(html, 'html.parser')

print(soup.find('td', {'class': "listtable_1"}).find('a', {"target":"_blank"}).text)

Output:

输出：

76561198134729239

Answer 3

回答by u6856342

"class":"listtable_1"belong to tdtag and target="_blank"belong to atag, you should not use them together.

"class":"listtable_1"属于td标签和target="_blank"属于a标签，你不应该一起使用它们。

you should use Steam Communityas an anchor to find the numbers after it.

您应该将其Steam Community用作锚来查找其后的数字。

OR use URL, The URL contain the info you need and it's easy to find, you can find the URL and split it by /:

或者使用 URL，URL 包含您需要的信息并且很容易找到，您可以找到 URL 并将其拆分为/：

for a in soup.find_all('a', href=re.compile(r'steamcommunity')):
    num = a['href'].split('/')[-1]
    print(num)

Code:

代码：

import requests
from lxml import html
from bs4 import BeautifulSoup
r = requests.get("http://ppm.rep.tf/index.php?p=banlist&page=154")
content = r.content
soup = BeautifulSoup(content, "html.parser")
for td in soup.find_all('td', string="Steam Community"):
    num = td.find_next_sibling('td').text
    print(num)

out:

出去：

76561198143466239
76561198094114508
76561198053422590
76561198066478249
76561198107353289
76561198043513442
76561198128253254
76561198134729239
76561198003749039
76561198091968935
76561198071376804
76561198068375438
76561198039625269
76561198135115106
76561198096243060
76561198067255227
76561198036439360
76561198026089333
76561198126749681
76561198008927797
76561198091421170
76561198122328638
76561198104586244
76561198056032796
76561198059683068
76561197995961306
76561198102013044

Answer 4

回答by alecxe

As others mentioned you are trying to check attributes of different elements in a single find(). Instead, you can chain find()calls as MYGz suggested, or use a single CSS selector:

正如其他人提到的，您正在尝试检查单个find(). 相反，您可以find()按照 MYGz 的建议链接调用，或使用单个CSS 选择器：

soup.select_one("td.listtable_1 a[target=_blank]").get_text()

If, you need to locate multiple elements this way, use select():

如果需要以这种方式定位多个元素，请使用select()：

for elm in soup.select("td.listtable_1 a[target=_blank]"):
    print(elm.get_text())

Python Beautifulsoup：解析html——获取href的一部分

提问by

采纳答案by Martin Evans

回答by MYGz

回答by u6856342

回答by alecxe

相关推荐

最近更新

标签

Python Beautifulsoup：解析html——获取href的一部分

提问by

采纳答案by Martin Evans

回答by MYGz

回答by u6856342

回答by alecxe

相关推荐

Python 参数里面的冒号是什么意思？

如何使用 Python 和 Selenium 发送 ESC 键来关闭弹出窗口？

Python 如何更改pip安装路径

Python 'Conv2D' 由 1 减 3 引起的负尺寸大小

相关推荐

最近更新

标签