pandas 熊猫累计计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40900195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas cumulative count
提问by jincept
I have a data frame like this:
我有一个这样的数据框:
0 04:10 obj1
1 04:10 obj1
2 04:11 obj1
3 04:12 obj2
4 04:12 obj2
5 04:12 obj1
6 04:13 obj2
Wanted to get a cumulative count for all the objects like this:
想要获得所有对象的累积计数,如下所示:
idx time object obj1_count obj2_count
0 04:10 obj1 1 0
1 04:10 obj1 2 0
2 04:11 obj1 3 0
3 04:12 obj2 3 1
4 04:12 obj2 3 2
5 04:12 obj1 4 2
6 04:13 obj2 4 3
Tried playing with cumsum but not sure that is the right way. Any suggestions?
尝试使用 cumsum 但不确定这是正确的方法。有什么建议?
回答by Alex Glinsky
There is a special function for such operation: cumcount
这种操作有一个特殊的功能: cumcount
>>> df = pd.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']], columns=['A'])
>>> df
A
0 a
1 a
2 a
3 b
4 b
5 a
>>> df.groupby('A').cumcount()
0 0
1 1
2 2
3 0
4 1
5 3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False)
0 3
1 2
2 1
3 1
4 0
5 0
dtype: int64
回答by EdChum
You can just compare the column against the value of interest and call cumsum:
您只需将该列与感兴趣的值进行比较并调用cumsum:
In [12]:
df['obj1_count'] = (df['object'] == 'obj1').cumsum()
df['obj2_count'] = (df['object'] == 'obj2').cumsum()
df
Out[12]:
time object obj1_count obj2_count
idx
0 04:10 obj1 1 0
1 04:10 obj1 2 0
2 04:11 obj1 3 0
3 04:12 obj2 3 1
4 04:12 obj2 3 2
5 04:12 obj1 4 2
6 04:13 obj2 4 3
Here the comparison will produce a boolean series:
这里的比较将产生一个布尔系列:
In [13]:
df['object'] == 'obj1'
Out[13]:
idx
0 True
1 True
2 True
3 False
4 False
5 True
6 False
Name: object, dtype: bool
when you call cumsumon the above the Truevalues are converted to 1and Falseto 0and are summed cumulatively
当你调用cumsum以上的True值转换为1和False到0并累加得到的
回答by root
You can generalize this process by getting the cumsumof pd.get_dummies. This should work for an arbitrary number of objects you want to count, without needing to specify each one individually:
您可以通过获取cumsumof来概括此过程pd.get_dummies。这应该适用于您想要计算的任意数量的对象,而无需单独指定每个对象:
# Get the cumulative counts.
counts = pd.get_dummies(df['object']).cumsum()
# Rename the count columns as appropriate.
counts = counts.rename(columns=lambda col: col+'_count')
# Join the counts to the original df.
df = df.join(counts)
The resulting output:
结果输出:
time object obj1_count obj2_count
0 04:10 obj1 1 0
1 04:10 obj1 2 0
2 04:11 obj1 3 0
3 04:12 obj2 3 1
4 04:12 obj2 3 2
5 04:12 obj1 4 2
6 04:13 obj2 4 3
You can omit the renamestep if it's acceptable to use count as a prefix instead of a suffix, i.e. 'count_obj1'instead of 'obj1_count'. Simply use the prefixparameter of pd.get_dummies:
如果可以rename使用 count 作为前缀而不是后缀,即'count_obj1'代替'obj1_count'. 只需使用以下prefix参数pd.get_dummies:
counts = pd.get_dummies(df['object'], prefix='count').cumsum()


