Python 为熊猫数据透视表中的每个值列定义 aggfunc
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20119414/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
define aggfunc for each values column in pandas pivot table
提问by VIKASH JAISWAL
Was trying to generate a pivot table with multiple "values" columns. I know I can use aggfunc to aggregate values the way I want to, but what if I don't want to sum or avg both columns but instead I want sum of one column while mean of the other one. So is it possible to do so using pandas?
试图生成具有多个“值”列的数据透视表。我知道我可以使用 aggfunc 以我想要的方式聚合值,但是如果我不想对两列求和或求平均值,而是想要一列的总和,而另一列的平均值。那么可以使用熊猫来做到这一点吗?
df = pd.DataFrame({
'A' : ['one', 'one', 'two', 'three'] * 6,
'B' : ['A', 'B', 'C'] * 8,
'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
'D' : np.random.randn(24),
'E' : np.random.randn(24)
})
Now this will get a pivot table with sum:
现在这将得到一个带有总和的数据透视表:
pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.sum)
And this for mean:
这意味着:
pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.mean)
How can I get sum for Dand mean for E?
我怎样才能得到 sum forD和 mean for E?
Hope my question is clear enough.
希望我的问题足够清楚。
采纳答案by Roman Pekar
You can concat two DataFrames:
您可以连接两个 DataFrames:
>>> df1 = pd.pivot_table(df, values=['D'], rows=['B'], aggfunc=np.sum)
>>> df2 = pd.pivot_table(df, values=['E'], rows=['B'], aggfunc=np.mean)
>>> pd.concat((df1, df2), axis=1)
D E
B
A 1.810847 -0.524178
B 2.762190 -0.443031
C 0.867519 0.078460
or you can pass list of functionsas aggfuncparameter and then reindex:
或者您可以将函数列表作为aggfunc参数传递,然后重新索引:
>>> df3 = pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=[np.sum, np.mean])
>>> df3
sum mean
D E D E
B
A 1.810847 -4.193425 0.226356 -0.524178
B 2.762190 -3.544245 0.345274 -0.443031
C 0.867519 0.627677 0.108440 0.078460
>>> df3 = df3.ix[:, [('sum', 'D'), ('mean','E')]]
>>> df3.columns = ['D', 'E']
>>> df3
D E
B
A 1.810847 -0.524178
B 2.762190 -0.443031
C 0.867519 0.078460
Alghouth, it would be nice to have an option to defin aggfuncfor each column individually. Don't know how it could be done, may be pass into aggfuncdict-like parameter, like {'D':np.mean, 'E':np.sum}.
Alghouth,最好有一个选项来aggfunc单独定义每一列。不知道怎么做,可能会传入aggfunc类似 dict 的参数,比如{'D':np.mean, 'E':np.sum}.
updateActually, in your case you can pivot by hand:
更新实际上,在您的情况下,您可以手动旋转:
>>> df.groupby('B').aggregate({'D':np.sum, 'E':np.mean})
E D
B
A -0.524178 1.810847
B -0.443031 2.762190
C 0.078460 0.867519
回答by DataSwede
You can apply a specific function to a specific column by passing in a dict.
您可以通过传入 dict 将特定函数应用于特定列。
pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc={'D':np.sum, 'E':np.mean})
回答by user10987461
table = pivot_table(df, values=['D', 'E'], index=['A', 'C'],
aggfunc={'D': np.mean,'E': np.sum})
table D E mean sum A C bar large 5.500000 7.500000 small 5.500000 8.500000 foo large 2.000000 4.500000 small 2.333333 4.333333
表 DE 平均和 AC bar 大 5.500000 7.500000 小 5.500000 8.500000 foo 大 2.000000 4.500000 小 2.333333 4.333333

