pandas 熊猫累计计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40900195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:33:18  来源:igfitidea点击:

Pandas cumulative count

pythonpandas

提问by jincept

I have a data frame like this:

我有一个这样的数据框:

0        04:10  obj1
1        04:10  obj1
2        04:11  obj1
3        04:12  obj2
4        04:12  obj2
5        04:12  obj1
6        04:13  obj2

Wanted to get a cumulative count for all the objects like this:

想要获得所有对象的累积计数,如下所示:

idx      time   object   obj1_count   obj2_count 
0        04:10  obj1        1             0
1        04:10  obj1        2             0
2        04:11  obj1        3             0
3        04:12  obj2        3             1
4        04:12  obj2        3             2
5        04:12  obj1        4             2
6        04:13  obj2        4             3

Tried playing with cumsum but not sure that is the right way. Any suggestions?

尝试使用 cumsum 但不确定这是正确的方法。有什么建议?

回答by Alex Glinsky

There is a special function for such operation: cumcount

这种操作有一个特殊的功能: cumcount

>>> df = pd.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']], columns=['A'])
>>> df
   A
0  a
1  a
2  a
3  b
4  b
5  a
>>> df.groupby('A').cumcount()
0    0
1    1
2    2
3    0
4    1
5    3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False)
0    3
1    2
2    1
3    1
4    0
5    0
 dtype: int64

回答by EdChum

You can just compare the column against the value of interest and call cumsum:

您只需将该列与感兴趣的值进行比较并调用cumsum

In [12]:
df['obj1_count'] = (df['object'] == 'obj1').cumsum()
df['obj2_count'] = (df['object'] == 'obj2').cumsum()
df

Out[12]:
      time object  obj1_count  obj2_count
idx                                      
0    04:10   obj1           1           0
1    04:10   obj1           2           0
2    04:11   obj1           3           0
3    04:12   obj2           3           1
4    04:12   obj2           3           2
5    04:12   obj1           4           2
6    04:13   obj2           4           3

Here the comparison will produce a boolean series:

这里的比较将产生一个布尔系列:

In [13]:
df['object'] == 'obj1'

Out[13]:
idx
0     True
1     True
2     True
3    False
4    False
5     True
6    False
Name: object, dtype: bool

when you call cumsumon the above the Truevalues are converted to 1and Falseto 0and are summed cumulatively

当你调用cumsum以上的True值转换为1False0并累加得到的

回答by root

You can generalize this process by getting the cumsumof pd.get_dummies. This should work for an arbitrary number of objects you want to count, without needing to specify each one individually:

您可以通过获取cumsumof来概括此过程pd.get_dummies。这应该适用于您想要计算的任意数量的对象,而无需单独指定每个对象:

# Get the cumulative counts.
counts = pd.get_dummies(df['object']).cumsum()

# Rename the count columns as appropriate.
counts = counts.rename(columns=lambda col: col+'_count')

# Join the counts to the original df.
df = df.join(counts)

The resulting output:

结果输出:

    time object  obj1_count  obj2_count
0  04:10   obj1           1           0
1  04:10   obj1           2           0
2  04:11   obj1           3           0
3  04:12   obj2           3           1
4  04:12   obj2           3           2
5  04:12   obj1           4           2
6  04:13   obj2           4           3

You can omit the renamestep if it's acceptable to use count as a prefix instead of a suffix, i.e. 'count_obj1'instead of 'obj1_count'. Simply use the prefixparameter of pd.get_dummies:

如果可以rename使用 count 作为前缀而不是后缀,即'count_obj1'代替'obj1_count'. 只需使用以下prefix参数pd.get_dummies

 counts = pd.get_dummies(df['object'], prefix='count').cumsum()

回答by piRSquared

Here's a way using numpy

这是使用 numpy 的一种方法

u, iv = np.unique(
    df.object.values,
    return_inverse=True
)

objcount = pd.DataFrame(
    (iv[:, None] == np.arange(len(u))).cumsum(0),
    df.index, u
)
pd.concat([df, objcount], axis=1)

enter image description here

在此处输入图片说明