pandas 熊猫累计计数

Question

提问by jincept

I have a data frame like this:

我有一个这样的数据框：

0        04:10  obj1
1        04:10  obj1
2        04:11  obj1
3        04:12  obj2
4        04:12  obj2
5        04:12  obj1
6        04:13  obj2

Wanted to get a cumulative count for all the objects like this:

想要获得所有对象的累积计数，如下所示：

idx      time   object   obj1_count   obj2_count 
0        04:10  obj1        1             0
1        04:10  obj1        2             0
2        04:11  obj1        3             0
3        04:12  obj2        3             1
4        04:12  obj2        3             2
5        04:12  obj1        4             2
6        04:13  obj2        4             3

Tried playing with cumsum but not sure that is the right way. Any suggestions?

尝试使用 cumsum 但不确定这是正确的方法。有什么建议？

Answer 1

回答by Alex Glinsky

There is a special function for such operation: cumcount

这种操作有一个特殊的功能： cumcount

>>> df = pd.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']], columns=['A'])
>>> df
   A
0  a
1  a
2  a
3  b
4  b
5  a
>>> df.groupby('A').cumcount()
0    0
1    1
2    2
3    0
4    1
5    3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False)
0    3
1    2
2    1
3    1
4    0
5    0
 dtype: int64

Answer 2

回答by EdChum

You can just compare the column against the value of interest and call cumsum:

您只需将该列与感兴趣的值进行比较并调用cumsum：

In [12]:
df['obj1_count'] = (df['object'] == 'obj1').cumsum()
df['obj2_count'] = (df['object'] == 'obj2').cumsum()
df

Out[12]:
      time object  obj1_count  obj2_count
idx                                      
0    04:10   obj1           1           0
1    04:10   obj1           2           0
2    04:11   obj1           3           0
3    04:12   obj2           3           1
4    04:12   obj2           3           2
5    04:12   obj1           4           2
6    04:13   obj2           4           3

Here the comparison will produce a boolean series:

这里的比较将产生一个布尔系列：

In [13]:
df['object'] == 'obj1'

Out[13]:
idx
0     True
1     True
2     True
3    False
4    False
5     True
6    False
Name: object, dtype: bool

when you call cumsumon the above the Truevalues are converted to 1and Falseto 0and are summed cumulatively

当你调用cumsum以上的True值转换为1和False到0并累加得到的

Answer 3

回答by root

You can generalize this process by getting the cumsumof pd.get_dummies. This should work for an arbitrary number of objects you want to count, without needing to specify each one individually:

您可以通过获取cumsumof来概括此过程pd.get_dummies。这应该适用于您想要计算的任意数量的对象，而无需单独指定每个对象：

# Get the cumulative counts.
counts = pd.get_dummies(df['object']).cumsum()

# Rename the count columns as appropriate.
counts = counts.rename(columns=lambda col: col+'_count')

# Join the counts to the original df.
df = df.join(counts)

The resulting output:

结果输出：

    time object  obj1_count  obj2_count
0  04:10   obj1           1           0
1  04:10   obj1           2           0
2  04:11   obj1           3           0
3  04:12   obj2           3           1
4  04:12   obj2           3           2
5  04:12   obj1           4           2
6  04:13   obj2           4           3

You can omit the renamestep if it's acceptable to use count as a prefix instead of a suffix, i.e. 'count_obj1'instead of 'obj1_count'. Simply use the prefixparameter of pd.get_dummies:

如果可以rename使用 count 作为前缀而不是后缀，即'count_obj1'代替'obj1_count'. 只需使用以下prefix参数pd.get_dummies：

 counts = pd.get_dummies(df['object'], prefix='count').cumsum()

Answer 4

回答by piRSquared

Here's a way using numpy

这是使用 numpy 的一种方法

u, iv = np.unique(
    df.object.values,
    return_inverse=True
)

objcount = pd.DataFrame(
    (iv[:, None] == np.arange(len(u))).cumsum(0),
    df.index, u
)
pd.concat([df, objcount], axis=1)

pandas 熊猫累计计数

提问by jincept

回答by Alex Glinsky

回答by EdChum

回答by root

回答by piRSquared

相关推荐

最近更新

标签

pandas 熊猫累计计数

提问by jincept

回答by Alex Glinsky

回答by EdChum

回答by root

回答by piRSquared

相关推荐

pandas 如何在python中使用pandas读取csv文件的所有行？

pandas 如何分组熊猫数据帧中的连续值

pandas 如何使用熊猫按组计算时间差？

Pandas：累积回报函数

相关推荐

最近更新

标签