Pandas groupby 和 value_counts

Question

提问by Susensio

I want to count distinct values per column (with pd.value_countsI guess) grouping data by some level in MultiIndex. The multiindex is taken care of with groupby(level=parameter, but applyraises a ValueError

我想计算每列的不同值（pd.value_counts我猜）按 MultiIndex 中的某个级别对数据进行分组。多索引由groupby(level=参数处理，但apply引发了ValueError

Original dataframe:

原始数据框：

>>> df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)),
                 columns=['c1','c2','c3','c4','c5'], 
                 index=pd.MultiIndex.from_product([['foo', 'bar'], 
                                                   ['w','y','x','y','z']]))



      c1 c2 c3 c4 c5
foo w  C  C  B  A  A
    y  A  A  C  B  A
    x  A  B  C  C  C
    y  A  B  C  C  C
    z  A  C  B  C  B
bar w  B  C  C  A  C
    y  A  A  C  A  A
    x  A  B  B  B  A
    y  A  A  C  A  B
    z  A  B  B  C  B

What I want:

我想要的是：

       c1  c2  c3  c4  c5
foo A   4   2   0   3   2
    B   1   2   2   1   2
    C   0   1   3   1   1
bar A   4   1   0   1   2
    B   0   2   2   1   1
    C   1   2   3   3   2

I try to do:

我尝试做：

>>> df.groupby(level=0).apply(pd.value_counts)

ValueError: could not broadcast input array from shape (5,5) into shape (5)

I can do it myself manually, but I think it must be a more obvious way.

我可以自己手动完成，但我认为它必须是一种更明显的方式。

groups = [g.apply(pd.value_counts).fillna(0) for n, g in df.groupby(level=0)]
index = df.index.get_level_values(0).unique()
correct_result = pd.concat(groups, keys=index)   # THIS WORKS AS EXPECTED

I mean, this isn't that long to write, but I feel like I'm reinventing the wheel. Aren't this kind of operations done by groupby function?

我的意思是，写这篇文章的时间并不长，但我觉得我正在重新发明轮子。这种操作不是groupby函数完成的吗？

Is there a more straightforward way of doing this, other than doing the split-apply-combine myself?

除了自己进行 split-apply-combine 之外，有没有更直接的方法来做到这一点？

Answer 1

回答by jezrael

Use stackfor MultiIndex Series, then SeriesGroupBy.value_countsand last unstackfor DataFrame:

使用stackfor MultiIndex Series， thenSeriesGroupBy.value_counts和 last unstackfor DataFrame：

np.random.seed(123)

df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)),
                 columns=['c1','c2','c3','c4','c5'], 
                 index=pd.MultiIndex.from_product([['foo', 'bar'], 
                                                   ['w','y','x','y','z']]))
print (df)
      c1 c2 c3 c4 c5
foo w  C  B  C  C  A
    y  C  C  B  C  B
    x  C  B  A  B  C
    y  B  A  C  A  B
    z  C  B  A  A  A
bar w  A  B  C  A  C
    y  A  A  B  A  B
    x  A  A  A  C  B
    y  B  C  C  C  B
    z  A  A  C  B  A

df1 = df.stack().groupby(level=[0,2]).value_counts().unstack(1, fill_value=0)
print (df1)
       c1  c2  c3  c4  c5
bar A   4   3   1   2   1
    B   1   1   1   1   3
    C   0   1   3   2   1
foo A   0   1   2   2   2
    B   1   3   1   1   2
    C   4   1   2   2   1

Pandas groupby 和 value_counts

提问by Susensio

回答by jezrael

相关推荐

最近更新

标签

Pandas groupby 和 value_counts

提问by Susensio

回答by jezrael

相关推荐

pandas align() 函数：说明性示例

pandas 如何分组和聚合熊猫中的多列

pandas 如何忽略熊猫的索引比较断言帧相等

Pandas 重采样：TypeError：仅对 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 有效，但得到了“RangeIndex”的实例

相关推荐

最近更新

标签