从 td 标签中获取 href 属性链接 BeautifulSoup Python
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16733109/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get href Attribute Link from td tag BeautifulSoup Python
提问by Zaid Iqbal
I am new in Python and someone suggested me to use Beautiful soup for Scrapping and i am struck in a problem to fetch the href attribute from a td tag Column 2 on the basis of year in column 4.
我是 Python 新手,有人建议我使用 Beautiful Soup 进行报废,我遇到了一个问题,即根据第 4 列中的年份从 td 标记第 2 列获取 href 属性。
<table class="tableFile2" summary="Results">
<tr>
<th width="7%" scope="col">Filings</th>
<th width="10%" scope="col">Format</th>
<th scope="col">Description</th>
<th width="10%" scope="col">Filing Date</th>
<th width="15%" scope="col">File/Film Number</th>
</tr>
<tr>
<td nowrap="nowrap">8-K</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513199324/0001193125-13-199324-index.htm" id="documentsbutton"> Documents</a></td>
<td class="small" >Current report, items 8.01 and 9.01
<br />Acc-no: 0001193125</td>
<td>2013-05-03</td>
<td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&filenum=000-10030&owner=include&count=40">000-10030</a><br>13813281 </td>
</tr>
<tr class="blueRow">
<td nowrap="nowrap">424B2</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513191849/0001193125-13-191849-index.htm" id="documentsbutton"> Documents</a></td>
<td class="small" >Prospectus [Rule 424(b)(2)]<br />Acc-no: 0001193125</td>
<td>2013-05-01</td>
<td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&filenum=333-188191&owner=include&count=40">333-188191</a><br>13802405 </td>
</tr>
<tr>
<td nowrap="nowrap">FWP</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513189053/0001193125-13-189053-index.htm" id="documentsbutton"> Documents</a></td>
<td class="small" >Filing under Securities Act Rules 163/433 of free writing prospectuses<br />Acc-no: 0001193125-13-189053 (34 Act) Size: 52 KB </td>
<td>2013-05-01</td>
<td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&filenum=333-188191&owner=include&count=40">333-188191</a><br>13800170 </td>
</tr>
</table>
table = soup.find('table', class="tableFile2")
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
if "2013" in cols[3]
link = cols[1].find('a').get('href')
print
采纳答案by Charles Marsh
This works for me in Python 2.7:
这在 Python 2.7 中对我有用:
table = soup.find('table', {'class': 'tableFile2'})
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
if len(cols) >= 4 and "2013" in cols[3].text:
link = cols[1].find('a').get('href')
print link
A few issues with your previous code:
您之前的代码存在一些问题:
soup.find()requires a dictionary of attributes (e.g.,{'class' : 'tableFile2'})- Not every
colsinstance will have at least 3 columns, so you need to check length first.
soup.find()需要一个属性字典(例如,{'class' : 'tableFile2'})- 并非每个
cols实例都至少有 3 列,因此您需要先检查长度。

