将 Pandas groupby 组转换为列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39323002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert Pandas groupby group into columns
提问by HonestMath
I'm trying to group a Pandas dataframe by two separate group types, A_Bucket and B_Bucket, and convert each A_Bucket group into a column. I get the groups as such:
我正在尝试按两种不同的组类型 A_Bucket 和 B_Bucket 对 Pandas 数据框进行分组,并将每个 A_Bucket 组转换为一列。我得到这样的组:
grouped = my_new_df.groupby(['A_Bucket','B_Bucket'])
I want the A_Bucket group to be in columns and the B_Bucket group to be the indices. 'A' has about 20 values and B has about 20 values, so there are a total of about 400 groups.
我希望 A_Bucket 组在列中,而 B_Bucket 组作为索引。'A' 大约有 20 个值,B 大约有 20 个值,所以总共有大约 400 个组。
When I print grouped and its type I get:
当我打印分组及其类型时,我得到:
type of grouped2 = <class 'pandas.core.groupby.DataFrameGroupBy'>
A_Bucket B_Bucket
0.100 100.0 5.418450
120.0 18.061367
0.125 80.0 3.100920
100.0 14.137063
120.0 30.744823
140.0 38.669950
160.0 48.303129
180.0 74.576333
200.0 125.119950
0.150 60.0 0.003200
80.0 2.274807
100.0 5.350074
120.0 23.272970
140.0 40.131780
160.0 47.036912
180.0 72.438978
200.0 117.365480
So A_Bucket group 0.100 has only 2 values, but 0.125 has 7. I want a dataframe like this:
所以 A_Bucket 组 0.100 只有 2 个值,但 0.125 有 7 个。我想要一个这样的数据框:
0.1 0.125 0.15
80 NaN 3.10092 2.274807
100 5.41845 14.137063 5.350074
120 18.0613 30.744823 23.27297
140 NaN 38.66995 40.13178
160 NaN 48.303129 47.036912
180 NaN 74.576333 72.438978
200 NaN 125.11995 NaN
I saw this question: Pandas groupby result into multiple columns
我看到了这个问题: Pandas groupby result into multiple columns
but I don't understand the syntax, and it doesn't arrange the first group into columns like I need. I also want this to work for more than one output column.
但我不明白语法,它不会像我需要的那样将第一组排列成列。我还希望这适用于多个输出列。
How do I do this?
我该怎么做呢?
回答by Psidom
If I understand you correctly, you are trying to reshape your data frame instead of grouping by summary, in this case you can use set_index()
and unstack()
:
如果我理解正确,您正在尝试重塑数据框而不是按摘要分组,在这种情况下,您可以使用set_index()
和unstack()
:
df.set_index(["A_Bucket", "B_Bucket"]).unstack(level=0)
# Value
# A_Bucket 0.100 0.125 0.150
# B_Bucket
# 60.0 NaN NaN 0.003200
# 80.0 NaN 3.100920 2.274807
# 100.0 5.418450 14.137063 5.350074
# 120.0 18.061367 30.744823 23.272970
# 140.0 NaN 38.669950 40.131780
# 160.0 NaN 48.303129 47.036912
# 180.0 NaN 74.576333 72.438978
# 200.0 NaN 125.119950 117.365480
If you indeed have done the summary after grouping by, you can still do df.groupby(['A_Bucket', 'B_Bucket']).mean().unstack(level=0)
如果你确实在分组后做了汇总,你仍然可以这样做 df.groupby(['A_Bucket', 'B_Bucket']).mean().unstack(level=0)