如何用python和beautifulsoup解析html表并写入csv

Question

提问by user2140323

I try to parse html page and fetch values for currencies and write to csv. I have following code:

我尝试解析 html 页面并获取货币值并写入 csv。我有以下代码：

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True) + ';'
        print text,
    print

The problem is, that I do not know, how to retrieve only values for currency. I tried some regexp like '^[0-9]{3}' - start with 3 digits but it doesn't work.

问题是，我不知道如何仅检索货币值。我尝试了一些像 '^[0-9]{3}' 这样的正则表达式 - 以 3 位数字开头，但它不起作用。

Answer 1

采纳答案by Martijn Pieters

You'd be much better off picking out specific cells in the table. The tdcells with the cell_cclass contain data you are interested in, and the last one is always the currency exchange rate:

您最好选择表格中的特定单元格。td具有cell_c该类的单元格包含您感兴趣的数据，最后一个始终是货币汇率：

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if 'cell_c' in cols[0]['class']:
        # currency row
        digital_code, letter_code, units, name, rate = [c.text for c in cols]
        print digital_code, letter_code, units, name, rate

With the data in separate variables, you can now turn the text to decimal numbers, store them in a database, whatever.

使用单独变量中的数据，您现在可以将文本转换为十进制数字，将它们存储在数据库中，等等。

如何用python和beautifulsoup解析html表并写入csv

提问by user2140323

采纳答案by Martijn Pieters

相关推荐

最近更新

标签

如何用python和beautifulsoup解析html表并写入csv

提问by user2140323

采纳答案by Martijn Pieters

相关推荐

Python pandas groupby 中的最大和最小日期

Python 使用 .readlines() 时摆脱 \n

如何在python中使用valgrind？

Python 如何使用 PyMySQL 获取 MySQL 类型的错误？

相关推荐

最近更新

标签