Pandas 非常简单来自 Group by 的总大小百分比

Question

提问by horatio1701d

I'm having trouble for a seemingly incredibly easy operation. What is the most succint way to just get a percent of total from a group by operation such as df.groupby['col1'].size(). My DF after grouping looks like this and I just want a percent of total. I remember using a variation of this statement in the past but cannot get this to work now: percent = totals.div(totals.sum(1), axis=0)

我遇到了一个看似非常简单的操作的麻烦。通过诸如df.groupby['col1'].size(). 分组后我的 DF 看起来像这样，我只想要总数的百分比。我记得过去使用过此语句的变体，但现在无法使其正常工作：percent = totals.div(totals.sum(1), axis=0)

Original DF:

原始DF：

       A   B   C
    0  77   3  98
    1  77  52  99
    2  77  58  61
    3  77   3  93
    4  77  31  99
    5  77  53  51
    6  77   2   9
    7  72  25  78
    8  34  41  34
    9  44  95  27

Result:

结果：

df1.groupby('A').size() / df1.groupby('A').size().sum()

    A
    34    0.1
    44    0.1
    72    0.1
    77    0.7

Here is what I came up with so far which seems pretty reasonable way to do this:

到目前为止，这是我想出的似乎很合理的方法：

df.groupby('col1').size().apply(lambda x: float(x) / df.groupby('col1').size().sum()*100)

Answer 1

采纳答案by horatio1701d

Getting good performance (3.73s) on DF with shape (3e6,59) by using: df.groupby('col1').size().apply(lambda x: float(x) / df.groupby('col1').size().sum()*100)

通过使用以下命令在形状为 (3e6,59) 的 DF 上获得良好的性能 (3.73s)： df.groupby('col1').size().apply(lambda x: float(x) / df.groupby('col1').size().sum()*100)

Answer 2

回答by Roman Pekar

I don't know if I'm missing something, but looks like you could do something like this:

我不知道我是否遗漏了什么，但看起来你可以做这样的事情：

df.groupby('A').size() * 100 / len(df)

or

或者

df.groupby('A').size() * 100 / df.shape[0]

Answer 3

回答by Alexander

How about:

怎么样：

df = pd.DataFrame({'A': {0: 77, 1: 77, 2: 77, 3: 77, 4: 77, 5: 77, 6: 77, 7: 72, 8: 34, 9: None},
                   'B': {0: 3, 1: 52, 2: 58, 3: 3, 4: 31, 5: 53, 6: 2, 7: 25, 8: 41, 9: 95},
                   'C': {0: 98, 1: 99, 2: 61, 3: 93, 4: 99, 5: 51, 6: 9, 7: 78, 8: 34, 9: 27}})

>>> df.groupby('A').size().divide(sum(df['A'].notnull()))
A
34    0.111111
72    0.111111
77    0.777778
dtype: float64

>>> df
    A   B   C
0  77   3  98
1  77  52  99
2  77  58  61
3  77   3  93
4  77  31  99
5  77  53  51
6  77   2   9
7  72  25  78
8  34  41  34
9 NaN  95  27

Pandas 非常简单来自 Group by 的总大小百分比

提问by horatio1701d

采纳答案by horatio1701d

回答by Roman Pekar

回答by Alexander

相关推荐

最近更新

标签

Pandas 非常简单 来自 Group by 的总大小百分比

提问by horatio1701d

采纳答案by horatio1701d

回答by Roman Pekar

回答by Alexander

相关推荐

Pandas DataFrame 浮点格式

pandas Python：回顾 n 天滚动标准差

如何获取 Pandas 中的行数？

如何从 Pandas 数据帧在 Matplotlib 热图中创建预定义的颜色范围

相关推荐

最近更新

标签

Pandas 非常简单来自 Group by 的总大小百分比