Pandas - 阅读 HTML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35241210/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - Reading HTML
提问by Martin598
I am trying to convert thistable into a pandas
DataFrame
我正在尝试将此表转换为pandas
DataFrame
I have done the following so far
到目前为止,我已经完成了以下工作
import pandas as pd
url = 'http://www.scb.se/sv_/Hitta-statistik/Statistik-efter-amne/Befolkning/Befolkningens-sammansattning/Befolkningsstatistik/25788/25795/Helarsstatistik---Riket/26046/'
df = pd.read_html(url,thousands=' ')
df2= df[0]
My problem here is that pandas
do not recognize that the index value 0
are the headers. I also want the column value ?r
to be the index value.
我的问题是pandas
不承认索引值0
是标题。我还希望列值?r
成为索引值。
Lastly, I would like to plot the Folkm?ngd
column values as Y
and the ?r
values as X
, in a line-plot.
最后,我想在线条图中绘制Folkm?ngd
列值 asY
和?r
值 as X
。
Thank you in advance.
先感谢您。
采纳答案by Padraic Cunningham
This should be close to what you want:
这应该接近你想要的:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
url = 'http://www.scb.se/sv_/Hitta-statistik/Statistik-efter-amne/Befolkning/Befolkningens-sammansattning/Befolkningsstatistik/25788/25795/Helarsstatistik---Riket/26046/'
table = pd.read_html(url,thousands=' ', header=0, index_col=0)[0]
table["Folkm?ngd"].plot(color='k')
plt.show()
Which should give you something like:
这应该给你类似的东西: