Pandas - 阅读 HTML

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35241210/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:38:29  来源:igfitidea点击:

Pandas - Reading HTML

pythonpandas

提问by Martin598

I am trying to convert thistable into a pandasDataFrame

我正在尝试将此表转换为pandasDataFrame

I have done the following so far

到目前为止,我已经完成了以下工作

import pandas as pd

url = 'http://www.scb.se/sv_/Hitta-statistik/Statistik-efter-amne/Befolkning/Befolkningens-sammansattning/Befolkningsstatistik/25788/25795/Helarsstatistik---Riket/26046/'

df = pd.read_html(url,thousands=' ')
df2= df[0]

My problem here is that pandasdo not recognize that the index value 0are the headers. I also want the column value ?rto be the index value.

我的问题是pandas不承认索引值0是标题。我还希望列值?r成为索引值。

Lastly, I would like to plot the Folkm?ngdcolumn values as Yand the ?rvalues as X, in a line-plot.

最后,我想在线条图中绘制Folkm?ngd列值 asY?r值 as X

Thank you in advance.

先感谢您。

采纳答案by Padraic Cunningham

This should be close to what you want:

这应该接近你想要的:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

url = 'http://www.scb.se/sv_/Hitta-statistik/Statistik-efter-amne/Befolkning/Befolkningens-sammansattning/Befolkningsstatistik/25788/25795/Helarsstatistik---Riket/26046/'

table = pd.read_html(url,thousands=' ', header=0, index_col=0)[0]
table["Folkm?ngd"].plot(color='k')
plt.show()

Which should give you something like:

这应该给你类似的东西:

enter image description here

在此处输入图片说明