重塑 Pandas 数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27803106/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:48:58  来源:igfitidea点击:

Reshaping Pandas Data Frame

pythonpython-2.7pandas

提问by TheOriginalBMan

I'm parsing some HTML data using Pandas like this:

我正在使用 Pandas 解析一些 HTML 数据,如下所示:

rankings = pd.read_html('https://en.wikipedia.org/wiki/Rankings_of_universities_in_the_United_Kingdom')
university_guide = rankings[0]

This gives me a nice data frame: enter image description here

这给了我一个不错的数据框: 在此处输入图片说明

What I want is to reshape this data frame so that there are only two columns (rank and university name). My current solution is to do something like this:

我想要的是重塑这个数据框,以便只有两列(排名和大学名称)。我目前的解决方案是做这样的事情:

ug_copy = rankings[0][1:]
npa1 = ug_copy.as_matrix( columns=[0,1] )
npa2 = ug_copy.as_matrix( columns=[2,3] )
npa3 = ug_copy.as_matrix( columns=[4,5] )

npam = np.append(npa1,npa2)
npam = np.append(npam,npa3)

reshaped = npam.reshape((npam.size/2,2))

pd.DataFrame(data=reshaped)

This gives me what I want, but it doesn't seem like it could possibly be the best solution. I can't seem to find a good way to complete this all using a data frame. I've tried using stack/unstack and pivoting the data frame (as some of the other solutions here have suggested), but I haven't had any luck. I've tried doing something like this:

这给了我我想要的东西,但它似乎不是最好的解决方案。我似乎找不到使用数据框完成这一切的好方法。我试过使用堆栈/取消堆栈并旋转数据框(正如这里的一些其他解决方案所建议的那样),但我没有任何运气。我试过做这样的事情:

ug_copy.columns=['Rank','University','Rank','University','Rank','University']
ug_copy = ug_copy[1:]
ug_copy.groupby(['Rank', 'University'])

There has to be something small I'm missing!

一定有一些小东西我错过了!

回答by YS-L

This is probably a bit shorter (also note that you can use the headeroption in read_htmlto save a bit of work):

这可能有点短(另请注意,您可以使用headerin 选项read_html来节省一些工作):

import pandas as pd
rankings = pd.read_html('https://en.wikipedia.org/wiki/Rankings_of_universities_in_the_United_Kingdom', header=0)
university_guide = rankings[0]
df = pd.DataFrame(university_guide.values.reshape((30, 2)), columns=['Rank', 'University'])
df = df.sort('Rank').reset_index(drop=True)
print df