pandas 累积和重置为 NaN

Question

提问by working4coins

If I have a pandas.core.series.Seriesnamed tsof either 1's or NaN's like this:

如果我有一个像这样的 1 或 NaN 的pandas.core.series.Series名字ts：

3382   NaN
3381   NaN
...
3369   NaN
3368   NaN
...
15     1
10   NaN
11     1
12     1
13     1
9    NaN
8    NaN
7    NaN
6    NaN
3    NaN
4      1
5      1
2    NaN
1    NaN
0    NaN

I would like to calculate cumsum of this serie but it should be reset (set to zero) at the location of the NaNs like below:

我想计算这个系列的 cumsum，但它应该在 NaN 的位置重置（设置为零），如下所示：

Ideally I would like to have a vectorized solution !

理想情况下，我想要一个矢量化的解决方案！

I ever see a similar question with Matlab : Matlab cumsum reset at NaN?

我曾经在 Matlab 中看到过类似的问题： Matlab cumsum reset at NaN？

but I don't know how to translate this line d = diff([0 c(n)]);

但我不知道如何翻译这一行 d = diff([0 c(n)]);

Answer 1

采纳答案by emprice

A simple Numpy translation of your Matlab code is this:

Matlab 代码的简单 Numpy 翻译是这样的：

import numpy as np

v = np.array([1., 1., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
n = np.isnan(v)
a = ~n
c = np.cumsum(a)
d = np.diff(np.concatenate(([0.], c[n])))
v[n] = -d
np.cumsum(v)

Executing this code returns the result array([ 1., 2., 3., 0., 1., 2., 3., 4., 0., 1.]). This solution will only be as valid as the original one, but maybe it will help you come up with something better if it isn't sufficient for your purposes.

执行此代码返回结果array([ 1., 2., 3., 0., 1., 2., 3., 4., 0., 1.])。此解决方案仅与原始解决方案一样有效，但如果它不足以满足您的目的，它可能会帮助您提出更好的解决方案。

Answer 2

回答by Phillip Cloud

Here's a slightly more pandas-onic way to do it:

这是一种稍微更像Pandas的方法：

v = Series([1, 1, 1, nan, 1, 1, 1, 1, nan, 1], dtype=float)
n = v.isnull()
a = ~n
c = a.cumsum()
index = c[n].index  # need the index for reconstruction after the np.diff
d = Series(np.diff(np.hstack(([0.], c[n]))), index=index)
v[n] = -d
result = v.cumsum()

Note that either of these requires that you're using pandasat least at 9da899bor newer. If you aren't then you can cast the booldtypeto an int64or float64dtype:

请注意，其中任何一个都要求您pandas至少使用at9da899b或更新版本。如果不是，则可以将booldtype转换为int64or float64dtype：

v = Series([1, 1, 1, nan, 1, 1, 1, 1, nan, 1], dtype=float)
n = v.isnull()
a = ~n
c = a.astype(float).cumsum()
index = c[n].index  # need the index for reconstruction after the np.diff
d = Series(np.diff(np.hstack(([0.], c[n]))), index=index)
v[n] = -d
result = v.cumsum()

Answer 3

回答by kadee

Even more pandas-onic way to do it:

更像Pandas的方式来做到这一点：

v = pd.Series([1., 3., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
cumsum = v.cumsum().fillna(method='pad')
reset = -cumsum[v.isnull()].diff().fillna(cumsum)
result = v.where(v.notnull(), reset).cumsum()

Contrary to the matlab code, this also works for values different from 1.

与 matlab 代码相反，这也适用于不同于 1 的值。

Answer 4

回答by Adam Fuller

If you can accept a similar boolean Series b, try

如果您可以接受类似的布尔系列b，请尝试

(b.cumsum() - b.cumsum().where(~b).fillna(method='pad').fillna(0)).astype(int)

Starting from your Series ts, either b = (ts == 1)or b = ~ts.isnull().

从系列开始ts，无论是b = (ts == 1)或b = ~ts.isnull()。

pandas 累积和重置为 NaN

提问by working4coins

采纳答案by emprice

回答by Phillip Cloud

回答by kadee

回答by Adam Fuller

相关推荐

最近更新

标签

pandas 累积和重置为 NaN

提问by working4coins

采纳答案by emprice

回答by Phillip Cloud

回答by kadee

回答by Adam Fuller

相关推荐

使用 Python Pandas 对 csv 文件中的行进行排序

在 Matplotlib 图中注释来自 Pandas 数据框的点

如何并行执行对 Pandas 数据帧的多个 SQL 查询

pandas 熊猫导入错误

相关推荐

最近更新

标签