Pandas 交叉表,但包含来自第三列聚合的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39735068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:05:59  来源:igfitidea点击:

Pandas crosstab, but with values from aggregation of third column

pythonpandasaggregate

提问by user1700890

Here is my problem:

这是我的问题:

df = pd.DataFrame({'A': ['one', 'one', 'two', 'two', 'one'] ,
                   'B': ['Ar', 'Br', 'Cr', 'Ar','Ar'] ,
                   'C': [1, 0, 0, 1,0 ]})

I would like to generate something like output of pd.crosstabfunction, but values on the intersection of column and row should come from aggregation of third column:

我想生成类似pd.crosstab函数输出的东西,但列和行交叉处的值应该来自第三列的聚合:

    Ar,  Br, Cr
one 0.5 0  0
two 1  0  0

For example, there are two cases of 'one' and 'Ar' corresponding values in column 'C' are 1,0 we sum up values in column 'C' (0+1) and divide by number of values in column 'C', so we get (0+1)/2 =0.5. Whenever combination is not present we (like 'Cr' and 'one') we set it to zero. Any thoughts?

例如,有两种情况,“C”列中的“一”和“Ar”对应值是 1,0 我们将“C”列中的值相加 (0+1) 并除以“C”列中的值的数量',所以我们得到 (0+1)/2 =0.5。每当不存在组合时(如“Cr”和“一”),我们将其设置为零。有什么想法吗?

回答by MaxU

you can use pivot_table()method, which uses aggfunc='mean'per-default:

您可以使用pivot_table()方法,该方法使用aggfunc='mean'每个默认值:

In [46]: df.pivot_table(index='A', columns='B', values='C', fill_value=0)
Out[46]:
B     Ar  Br  Cr
A
one  0.5   0   0
two  1.0   0   0

回答by piRSquared

I like groupbyand unstack

我喜欢groupbyunstack

df.groupby(['A', 'B']).C.mean().unstack(fill_value=0)

enter image description here

在此处输入图片说明