pandas 带有条件的列上的熊猫累积总和

Question

提问by rpeczykowski

I didn't found answer elsewhere, so I need to ask. Probably because I don't know how to correctly name it. (English is not my origin language)

我在其他地方没有找到答案，所以我需要问一下。可能是因为我不知道如何正确命名它。（英语不是我的母语）

I have large datetime data frame. Time is important here. One column in df has values [Nan, 1, -1]. I need to perform quick calculation to have cumulative sum reseting when value is changing.

我有很大的日期时间数据框。时间在这里很重要。df 中的一列具有值 [Nan, 1, -1]。我需要执行快速计算以在值发生变化时重置累积总和。

Example.

例子。

    Time                 sign    desire_value
2014-01-24 05:00:00      Nan     Nan 
2014-01-24 06:00:00      Nan     Nan
2014-01-24 07:00:00      Nan     Nan 
2014-01-24 08:00:00      1       1
2014-01-24 09:00:00      1       2
2014-01-24 10:00:00      1       3
2014-01-24 11:00:00      -1      1
2014-01-24 12:00:00      -1      2
2014-01-24 13:00:00      -1      3
2014-01-24 14:00:00      -1      4
2014-01-24 15:00:00      -1      5
2014-01-24 16:00:00      1       1
2014-01-24 17:00:00      1       2
2014-01-24 18:00:00      1       3
2014-01-24 19:00:00      -1      1
2014-01-24 20:00:00      -1      2  
2014-01-24 21:00:00      1       1
2014-01-24 22:00:00      1       2

I have working solution using function, but it is not very efficient.

我有使用函数的工作解决方案，但效率不高。

    df['sign_1'] = df['sign'].shift(1)

    for index, row in df.iterrows():
        if row.sign is None:
            df.loc[line, 'desire_value'] = None
        elif row.sign == row.sign_1:
            acc += 1
            df.loc[index, 'desire_value'] = acc
        else:
            acc = 1 
            df.loc[index, 'desire_value'] = acc

I cannot find any array based approach. I found that the best way to iterate efficiently in Python is using Cython, but is there more "Python" way to solve this?

我找不到任何基于数组的方法。我发现在 Python 中高效迭代的最佳方法是使用 Cython，但是否有更多“Python”方法来解决这个问题？

Answer 1

回答by Jeff

see the last section here

在这里看到最后一部分

This is an itertools like groupby

这是一个类似 groupby 的 itertools

In [86]: v = df['value'].dropna()

The grouper is separated on the group breakpoints; cumsum makes it have separate groups

石斑鱼在组断点上分离；cumsum 使它具有单独的组

In [87]: grouper = (v!=v.shift()).cumsum()

In [88]: grouper
Out[88]: 
3     1
4     1
5     1
6     2
7     2
8     2
9     2
10    2
11    3
12    3
13    3
14    4
15    4
16    5
17    5
Name: value, dtype: int64

Then just a simple cumsum

然后只是一个简单的cumsum

In [89]: df.groupby(grouper)['value'].cumsum()
Out[89]: 
0    NaN
1    NaN
2    NaN
3      1
4      2
5      3
6     -1
7     -2
8     -3
9     -4
10    -5
11     1
12     2
13     3
14    -1
15    -2
16     1
17     2
dtype: float64

You can certainly .abs()the above if you do in fact want the absolute values.

.abs()如果您确实想要绝对值，您当然可以进行上述操作。

pandas 带有条件的列上的熊猫累积总和

提问by rpeczykowski

回答by Jeff

相关推荐

最近更新

标签

pandas 带有条件的列上的熊猫累积总和

提问by rpeczykowski

回答by Jeff

相关推荐

pandas 如何使用另一个 MultiIndex 切片一个 MultiIndex DataFrame

pandas ValueError：对于 orient='columns'，DataFrame 索引必须是唯一的

pandas 一个数据帧的每一列的最大值和最小值

pandas 如何获得数据框的简单散点图（最好使用 seaborn）

相关推荐

最近更新

标签