如何将 html 表转换为 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16009778/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:46:15  来源:igfitidea点击:

How to convert a html table into pandas dataframe

pythondataframepandashtml-table

提问by waitingkuo

pandasprovides an useful to_html()to convert the DataFrameinto the html table. Is there any useful function to read it back to the DataFrame?

pandas提供了一个有用to_html()的转换DataFramehtml table. 是否有任何有用的功能可以将其读回DataFrame

采纳答案by waitingkuo

The read_htmlutility released in pandas 0.12

pandas 0.12 中发布的read_html实用程序

回答by elyase

In the general case it is not possible but if you approximately know the structure of your table you could something like this:

在一般情况下,这是不可能的,但如果您大致了解表的结构,您可以这样做:

# Create a test df:
>>> df = DataFrame(np.random.rand(4,5), columns = list('abcde'))
>>> df
     a           b           c           d           e
0    0.675006    0.230464    0.386991    0.422778    0.657711
1    0.250519    0.184570    0.470301    0.811388    0.762004
2    0.363777    0.715686    0.272506    0.124069    0.045023
3    0.657702    0.783069    0.473232    0.592722    0.855030

Now parse the html and reconstruct:

现在解析 html 并重建:

from pyquery import PyQuery as pq

d = pq(df.to_html())
columns = d('thead tr').eq(0).text().split()
n_rows = len(d('tbody tr'))
values = np.array(d('tbody tr td').text().split(), dtype=float).reshape(n_rows, len(columns))
>>> DataFrame(values, columns=columns)

     a           b           c           d           e
0    0.675006    0.230464    0.386991    0.422778    0.657711
1    0.250519    0.184570    0.470301    0.811388    0.762004
2    0.363777    0.715686    0.272506    0.124069    0.045023
3    0.657702    0.783069    0.473232    0.592722    0.855030

You could extend it for Multiindex dfs or automatic type detection using eval()if needed.

eval()如果需要,您可以将其扩展为 Multiindex dfs 或自动类型检测。