pandas 熊猫累计计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40900195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas cumulative count
提问by jincept
I have a data frame like this:
我有一个这样的数据框:
0 04:10 obj1
1 04:10 obj1
2 04:11 obj1
3 04:12 obj2
4 04:12 obj2
5 04:12 obj1
6 04:13 obj2
Wanted to get a cumulative count for all the objects like this:
想要获得所有对象的累积计数,如下所示:
idx time object obj1_count obj2_count
0 04:10 obj1 1 0
1 04:10 obj1 2 0
2 04:11 obj1 3 0
3 04:12 obj2 3 1
4 04:12 obj2 3 2
5 04:12 obj1 4 2
6 04:13 obj2 4 3
Tried playing with cumsum but not sure that is the right way. Any suggestions?
尝试使用 cumsum 但不确定这是正确的方法。有什么建议?
回答by Alex Glinsky
There is a special function for such operation: cumcount
这种操作有一个特殊的功能: cumcount
>>> df = pd.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']], columns=['A'])
>>> df
A
0 a
1 a
2 a
3 b
4 b
5 a
>>> df.groupby('A').cumcount()
0 0
1 1
2 2
3 0
4 1
5 3
dtype: int64
>>> df.groupby('A').cumcount(ascending=False)
0 3
1 2
2 1
3 1
4 0
5 0
dtype: int64
回答by EdChum
You can just compare the column against the value of interest and call cumsum
:
您只需将该列与感兴趣的值进行比较并调用cumsum
:
In [12]:
df['obj1_count'] = (df['object'] == 'obj1').cumsum()
df['obj2_count'] = (df['object'] == 'obj2').cumsum()
df
Out[12]:
time object obj1_count obj2_count
idx
0 04:10 obj1 1 0
1 04:10 obj1 2 0
2 04:11 obj1 3 0
3 04:12 obj2 3 1
4 04:12 obj2 3 2
5 04:12 obj1 4 2
6 04:13 obj2 4 3
Here the comparison will produce a boolean series:
这里的比较将产生一个布尔系列:
In [13]:
df['object'] == 'obj1'
Out[13]:
idx
0 True
1 True
2 True
3 False
4 False
5 True
6 False
Name: object, dtype: bool
when you call cumsum
on the above the True
values are converted to 1
and False
to 0
and are summed cumulatively
当你调用cumsum
以上的True
值转换为1
和False
到0
并累加得到的
回答by root
You can generalize this process by getting the cumsum
of pd.get_dummies
. This should work for an arbitrary number of objects you want to count, without needing to specify each one individually:
您可以通过获取cumsum
of来概括此过程pd.get_dummies
。这应该适用于您想要计算的任意数量的对象,而无需单独指定每个对象:
# Get the cumulative counts.
counts = pd.get_dummies(df['object']).cumsum()
# Rename the count columns as appropriate.
counts = counts.rename(columns=lambda col: col+'_count')
# Join the counts to the original df.
df = df.join(counts)
The resulting output:
结果输出:
time object obj1_count obj2_count
0 04:10 obj1 1 0
1 04:10 obj1 2 0
2 04:11 obj1 3 0
3 04:12 obj2 3 1
4 04:12 obj2 3 2
5 04:12 obj1 4 2
6 04:13 obj2 4 3
You can omit the rename
step if it's acceptable to use count as a prefix instead of a suffix, i.e. 'count_obj1'
instead of 'obj1_count'
. Simply use the prefix
parameter of pd.get_dummies
:
如果可以rename
使用 count 作为前缀而不是后缀,即'count_obj1'
代替'obj1_count'
. 只需使用以下prefix
参数pd.get_dummies
:
counts = pd.get_dummies(df['object'], prefix='count').cumsum()