Python Pandas,将 groupby() 组标签设置为新数据帧中的索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34113203/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas, setting groupby() group labels as index in a new dataframe
提问by Okechukwu Ossai
I am a python programming beginner trying to figure out how a group label from groupby operation can be used as index of a new dataframe. For example,
我是一名 Python 编程初学者,试图弄清楚如何将 groupby 操作中的组标签用作新数据帧的索引。例如,
df = pd.DataFrame({'Country': ['USA', 'USA', 'UK', 'China', 'Canada', 'Australia', 'UK', 'China', 'USA'],
'Year': [1979, 1983, 1987, 1991, 1995, 1999, 2003, 2007, 2011],
'Medals': [52, 30, 25, 41, 19, 17, 9, 14, 12]})
df:
Country Medals Year
0 USA 52 1979
1 USA 30 1983
2 UK 25 1987
3 China 41 1991
4 Canada 19 1995
5 Australia 17 1999
6 UK 9 2003
7 China 14 2007
8 USA 12 2011
c1 = df.groupby(df['Country'], as_index=True, sort=False, group_keys=True).size()
c1:
Country
USA 3
UK 2
China 2
Canada 1
Australia 1
I want to create a new dataframe with the above c1 results exactly in that format but I have not been able to do that. Below is what I get:
我想使用上述 c1 结果完全按照该格式创建一个新数据框,但我无法做到这一点。以下是我得到的:
d1 = pd.DataFrame(np.array(c1), columns=['Frequency'])
d1:
Frequency
0 3
1 2
2 2
3 1
4 1
I want the group labels as index and not the default 0, 1, 2, 3 and 4. This is exactly what I want:
我想要组标签作为索引而不是默认的 0、1、2、3 和 4。这正是我想要的:
Desired Output:
Frequency
USA 3
UK 2
China 2
Canada 1
Australia 1
Please how can I achieve this? I guess if I create a label with the countries and assign it as index, it might work. However, the original data I'm practising with has so many rows that it will be impossible for me to create a label list. Any ideas will be highly appreciated.
请问我怎样才能做到这一点?我想如果我用国家/地区创建一个标签并将其分配为索引,它可能会起作用。但是,我正在练习的原始数据有很多行,我无法创建标签列表。任何想法将不胜感激。
采纳答案by Josh Rumbut
Edit: let's see how you like this one!
编辑:让我们看看你喜欢这个!
c1 = pd.DataFrame(c1.values, index=c1.index.values, columns=['Frequency'])
print(c1)
Frequency
USA 3
UK 2
China 2
Canada 1
Australia 1
c1.values
is roughly equivalent (for our purposes) to np.array(c1)
but avoids needing to import numpy.
c1.values
大致相当于(出于我们的目的)np.array(c1)
但避免了需要导入 numpy.
Original response (doesn't quite work, left for posterity): You are likely looking for the set_index
method.
原始回复(不太有效,留给后人):您可能正在寻找set_index
方法。
It should work something like this:
它应该像这样工作:
c1 = df.groupby(df['Country'], as_index=True, sort=False, group_keys=True).size()
c2 = c1.set_index(['Country'])
Let me know if this works for you!
让我知道这是否适合您!
回答by Okechukwu Ossai
Finally, I figured out what seems to be a working solution. I realized that c1 is a series and not a dataframe, with index which is callable by c1.index. So, I improved the code by specifying the index;
最后,我想出了什么似乎是可行的解决方案。我意识到 c1 是一个系列而不是数据帧,其索引可由 c1.index 调用。所以,我通过指定索引来改进代码;
d1 = pd.DataFrame(np.array(c1), index=c1.index, columns=['Frequency'])
d1:
d1:
Frequency
Country
USA 3
UK 2
China 2
Canada 1
Australia 1
I don't know if this is the best solution. Better ideas are still welcome.
我不知道这是否是最好的解决方案。更好的想法仍然受欢迎。