从标签beautifulsoup python中提取类名

Question

提问by kegewe

I have the following HTML code:

我有以下 HTML 代码：

    <td class="image">
      <a href="/target/tt0111161/" title="Target Text 1">
       <img alt="target img" height="74" src="img src url" title="image title" width="54"/>
      </a>
     </td>
     <td class="title">
      <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">
      </span>
      <a href="/target/tt0111161/">
       Other Text
      </a>
      <span class="year_type">
       (2013)
      </span>

I am trying to use beautiful soup to parse certain elements into a tab-delimited file. I got some great help and have:

我正在尝试使用漂亮的汤将某些元素解析为制表符分隔的文件。我得到了一些很大的帮助，并且有：

for td in soup.select('td.title'):
 span = td.select('span.wlb_wrapper')
 if span:
     print span[0].get('data-tconst') # To get `tt0082971`

Now I want to get "Target Text 1" .

现在我想得到 "Target Text 1" 。

I've tried some things like the above text such as:

我已经尝试了一些类似上面的文字，例如：

for td in soup.select('td.image'): #trying to select the <td class="image"> tag
img = td.select('a.title') #from inside td I now try to look inside the a tag that also has the word title
if img:
    print img[2].get('title') #if it finds anything, then I want to return the text in class 'title'

Answer 1

采纳答案by Jared Messenger

If you're trying to get a different td based on the class (i.e. td class="image" and td class="title" you can use beautiful soup as a dictionary to get the different classes.

如果您试图根据类（即 td class="image" 和 td class="title"）获得不同的 td，您可以使用美丽的汤作为字典来获取不同的类。

This will find all the td class="image" in the table.

这将在表中找到所有 td class="image"。

from bs4 import BeautifulSoup

page = """
<table>
    <tr>
        <td class="image">
           <a href="/target/tt0111161/" title="Target Text 1">
            <img alt="target img" height="74" src="img src url" title="image title" width="54"/>
           </a>
          </td>
          <td class="title">
           <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">
           </span>
           <a href="/target/tt0111161/">
            Other Text
           </a>
           <span class="year_type">
            (2013)
           </span>
        </td>
    </tr>
</table>
"""
soup = BeautifulSoup(page)
tbl = soup.find('table')
rows = tbl.findAll('tr')
for row in rows:
    cols = row.find_all('td')
    for col in cols:
        if col.has_attr('class') and col['class'][0] == 'image':
            hrefs = col.find_all('a')
            for href in hrefs:
                print href.get('title')

        elif col.has_attr('class') and col['class'][0] == 'title':
            spans = col.find_all('span')
            for span in spans:
                if span.has_attr('class') and span['class'][0] == 'wlb_wrapper':
                    print span.get('data-tconst')

Answer 2

回答by hemanth

span.wlb_wrapperis a selector used to select <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">. Refer this& thisfor more information on selectors

span.wlb_wrapper是用于选择的选择器<span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">。有关选择器的更多信息，请参阅this& this

change this in your python code span = td.select('span.wlb_wrapper')to span = td.select('span')& also span = td.select('span.year_type')and see what it returns.

在你的 python 代码中将其更改span = td.select('span.wlb_wrapper')为span = td.select('span')&span = td.select('span.year_type')并查看它返回的内容。

If you try above and analyze what spanholds you will get what you want.

如果你尝试上面并分析什么是span成立的，你就会得到你想要的。

从标签beautifulsoup python中提取类名

提问by kegewe

采纳答案by Jared Messenger

回答by hemanth

相关推荐

最近更新

标签

从标签beautifulsoup python中提取类名

提问by kegewe

采纳答案by Jared Messenger

回答by hemanth

相关推荐

Map 对象在 Python 3 中没有 len()

如何在 Python 中的一行中附加多个项目

Python matplotlib 百分比标签在饼图中的位置

你如何替换文本文件中的一行文本（python）

相关推荐

最近更新

标签