Python 如何从 BeautifulSoup 中表的 td 中获取价值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34144389/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get value from table's td in BeautifulSoup?
提问by yak
I have a page with some tables in its source:
我有一个页面,其中包含一些表格:
<table width='100%' cellspacing='0' cellpadding='2' class='an'>
<tr>
<td width='35%' align='right'>XXX :</td>
<td><b>20</b></td>
</tr>
<tr><
td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
</table>
<table width='361' cellspacing='0' cellpadding='2' class='an'>
<tr>
<td width='35%' align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XX :</td>
<td><a href='XXX'><b>XXX</b></a></td>
</tr>
<tr>
<td align='right'>PHONE :</td>
<td><b>518878943</b></td>
</tr>
</table>
I would like to get from this page a phone number, from the second table:
我想从这个页面得到一个电话号码,从第二个表:
<td align='right'>PHONE :</td>
<td><b>518878943</b></td>
However, my code:
但是,我的代码:
page_src="""<table width='100%' cellspacing='0' cellpadding='2' class='an'>
<tr>
<td width='35%' align='right'>XXX :</td>
<td><b>20</b></td>
</tr>
<tr><
td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
</table>
<table width='361' cellspacing='0' cellpadding='2' class='an'>
<tr>
<td width='35%' align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XXX :</td>
<td><b>XXX</b></td>
</tr>
<tr>
<td align='right'>XX :</td>
<td><a href='XXX'><b>XXX</b></a></td>
</tr>
<tr>
<td align='right'>PHONE :</td>
<td><b>518878943</b></td>
</tr>
</table>
"""
soup = BeautifulSoup(page_src, 'html.parser')
divs = soup.findAll("table", {"class": "an"})
for div in divs:
row = ''
rows = [row in div.findAll('tbody').findAll('tr')]
Gives me such an error message:
给了我这样的错误信息:
Traceback (most recent call last):
File "test.py", line 198, in <module>
rows = [row in div.findAll('tbody').findAll('tr')]
AttributeError: 'ResultSet' object has no attribute 'findAll'
How to solve this and get the phone number from the page? Thanks
如何解决这个问题并从页面获取电话号码?谢谢
EDIT:
编辑:
Partly solved. Partly, because I think my solution is ugly, but works. Maybe someone will come up with prettier solution?
部分解决。部分是因为我认为我的解决方案很难看,但有效。也许有人会想出更漂亮的解决方案?
tds = []
soup = BeautifulSoup(page_src, 'html.parser')
divs = soup.findAll("table", {"class": "an"})
for div in divs:
rows = div.findAll('tr')
for row in rows :
tds.append(row.findAll('td'))
phone = str(tds[12][1])
phone = phone.replace("<td><b>", "").replace("</b></td>", "").strip()
print phone
采纳答案by alecxe
Find the td
element containing PHONE :
and then get the following sibling element. One line:
找到td
包含的元素PHONE :
,然后获取以下同级元素。一条线:
soup.find("td", text="PHONE :").find_next_sibling("td").text
回答by dstudeba
You have a couple of problems with your code.
您的代码有几个问题。
divs = soup.findAll("table", {"class": "an"})
for div in divs:
row = ''
rows = [row in div.findAll('tbody').findAll('tr')]
First problem is there are no tbody
tags so div.findAll('tbody')
will return nothing.
第一个问题是没有tbody
标签,所以div.findAll('tbody')
不会返回任何内容。
Second problem is that div.findAll('tbody')
would return an array, not a tag, so you can't call findAll('tr')
on it.
第二个问题是它div.findAll('tbody')
会返回一个数组,而不是一个标签,所以你不能调用findAll('tr')
它。
Here is what you want to get all the tr
tags in the table:
这是您想要获取tr
表中所有标签的内容:
divs = soup.findAll("table", {"class": "an"})
for div in divs:
row = ''
rows = div.findAll('tr')
You can then go through all the tr
tags and call .text
to get the text inside the row, and whichever ones have "PHONE" in them are the ones you want.
然后,您可以浏览所有tr
标签并调用.text
以获取行内的文本,其中包含“PHONE”的就是您想要的。
soup = BeautifulSoup(page_src, 'html.parser')
divs = soup.findAll("table", {"class": "an"})
for div in divs:
row = ''
rows = div.findAll('tr')
for row in rows:
if(row.text.find("PHONE") > -1):
print(row.text)
generates:
产生:
PHONE :
518878943