Pandas 交叉表，但包含来自第三列聚合的值

Question

提问by user1700890

Here is my problem:

这是我的问题：

df = pd.DataFrame({'A': ['one', 'one', 'two', 'two', 'one'] ,
                   'B': ['Ar', 'Br', 'Cr', 'Ar','Ar'] ,
                   'C': [1, 0, 0, 1,0 ]})

I would like to generate something like output of pd.crosstabfunction, but values on the intersection of column and row should come from aggregation of third column:

我想生成类似pd.crosstab函数输出的东西，但列和行交叉处的值应该来自第三列的聚合：

    Ar,  Br, Cr
one 0.5 0  0
two 1  0  0

For example, there are two cases of 'one' and 'Ar' corresponding values in column 'C' are 1,0 we sum up values in column 'C' (0+1) and divide by number of values in column 'C', so we get (0+1)/2 =0.5. Whenever combination is not present we (like 'Cr' and 'one') we set it to zero. Any thoughts?

例如，有两种情况，“C”列中的“一”和“Ar”对应值是 1,0 我们将“C”列中的值相加 (0+1) 并除以“C”列中的值的数量'，所以我们得到 (0+1)/2 =0.5。每当不存在组合时（如“Cr”和“一”），我们将其设置为零。有什么想法吗？

Answer 1

回答by MaxU

you can use pivot_table()method, which uses aggfunc='mean'per-default:

您可以使用pivot_table()方法，该方法使用aggfunc='mean'每个默认值：

In [46]: df.pivot_table(index='A', columns='B', values='C', fill_value=0)
Out[46]:
B     Ar  Br  Cr
A
one  0.5   0   0
two  1.0   0   0

Answer 2

回答by piRSquared

I like groupbyand unstack

我喜欢groupby和unstack

df.groupby(['A', 'B']).C.mean().unstack(fill_value=0)

Pandas 交叉表，但包含来自第三列聚合的值

提问by user1700890

回答by MaxU

回答by piRSquared

相关推荐

最近更新

标签

Pandas 交叉表，但包含来自第三列聚合的值

提问by user1700890

回答by MaxU

回答by piRSquared

相关推荐

pandas DataFrame 值开始于

pandas dask 数据框如何将列转换为 to_datetime

Python Pandas 数据框 sort_values 不起作用

pandas 如何在 IronPython 中安装包/模块

相关推荐

最近更新

标签