Python 将 HTML 表格转换为 JSON

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18544634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:59:49  来源:igfitidea点击:

Convert a HTML Table to JSON

pythonhtmljsonbeautifulsouphtml-table

提问by declanjscott

I'm trying to convert a table I have extracted via BeautifulSoup into JSON.

我正在尝试将通过 BeautifulSoup 提取的表格转换为 JSON。

So far I've managed to isolate all the rows, though I'm not sure how to work with the data from here. Any advice would be very much appreciated.

到目前为止,我已经设法隔离了所有行,但我不确定如何处理来自这里的数据。任何建议将不胜感激。

[<tr><td><strong>Balance</strong></td><td><strong>.30</strong></td></tr>, 
<tr><td>Card name</td><td>Name</td></tr>, 
<tr><td>Account holder</td><td>NAME</td></tr>, 
<tr><td>Card number</td><td>1234</td></tr>, 
<tr><td>Status</td><td>Active</td></tr>]

(Line breaks mine for readability)

(为了可读性,我的换行符)

This was my attempt:

这是我的尝试:

result = []
allrows = table.tbody.findAll('tr')
for row in allrows:
    result.append([])
    allcols = row.findAll('td')
    for col in allcols:
        thestrings = [unicode(s) for s in col.findAll(text=True)]
        thetext = ''.join(thestrings)
        result[-1].append(thetext)

which gave me the following result:

这给了我以下结果:

[
 [u'Card balance', u'.30'],
 [u'Card name', u'NAMEn'],
 [u'Account holder', u'NAME'],
 [u'Card number', u'1234'],
 [u'Status', u'Active']
]

采纳答案by H.D.

Probably your data is something like:

可能你的数据是这样的:

html_data = """
<table>
  <tr>
    <td>Card balance</td>
    <td>.30</td>
  </tr>
  <tr>
    <td>Card name</td>
    <td>NAMEn</td>
  </tr>
  <tr>
    <td>Account holder</td>
    <td>NAME</td>
  </tr>
  <tr>
    <td>Card number</td>
    <td>1234</td>
  </tr>
  <tr>
    <td>Status</td>
    <td>Active</td>
  </tr>
</table>
"""

From which we can get your result as a list using this code:

从中我们可以使用以下代码以列表形式获取您的结果:

from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")]
                         for row in BeautifulSoup(html_data)("tr")]

To convert the result to JSON, if you don't care about the order:

要将结果转换为 JSON,如果您不关心顺序:

import json
print json.dumps(dict(table_data))

Result:

结果:

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": ".30"
}

If you need the same order, use this:

如果您需要相同的订单,请使用以下命令:

from collections import OrderedDict
import json
print json.dumps(OrderedDict(table_data))

Which gives you:

这给了你:

{
    "Card balance": ".30",
    "Card name": "NAMEn",
    "Account holder": "NAME",
    "Card number": "1234",
    "Status": "Active"
}