Python 当应用中也计算了先前的值时，Pandas 是否有办法在 dataframe.apply 中使用先前的行值？

Question

提问by ctrl-alt-delete

I have the following dataframe:

我有以下数据框：

 Index_Date    A    B    C    D
 ===============================
 2015-01-31    10   10   Nan  10
 2015-02-01     2    3   Nan  22 
 2015-02-02    10   60   Nan  280
 2015-02-03    10   100   Nan  250

Require:

要求：

 Index_Date    A    B    C    D
 ===============================
 2015-01-31    10   10   10   10
 2015-02-01     2    3   23   22
 2015-02-02    10   60   290  280
 2015-02-03    10   100  3000 250

Column Cis derived for 2015-01-31by taking valueof D.

Column C是2015-01-31通过取推导出来value的D。

Then I need to use the valueof Cfor 2015-01-31and multiply by the valueof Aon 2015-02-01and add B.

然后我需要使用valueof Cfor2015-01-31并乘以valueof Aon2015-02-01并添加B。

I have attempted an applyand a shiftusing an if elseby this gives a key error.

我已尝试apply和shift使用if else该给出一个关键的错误。

Answer 1

采纳答案by Stefan

First, create the derived value:

首先，创建派生值：

df.loc[0, 'C'] = df.loc[0, 'D']

Then iterate through the remaining rows and fill the calculated values:

然后遍历剩余的行并填充计算值：

for i in range(1, len(df)):
    df.loc[i, 'C'] = df.loc[i-1, 'C'] * df.loc[i, 'A'] + df.loc[i, 'B']


  Index_Date   A   B    C    D
0 2015-01-31  10  10   10   10
1 2015-02-01   2   3   23   22
2 2015-02-02  10  60  290  280

Answer 2

回答by Stefan

Applying the recursive function on numpy arrays will be faster than the current answer.

在 numpy 数组上应用递归函数将比当前答案更快。

df = pd.DataFrame(np.repeat(np.arange(2, 6),3).reshape(4,3), columns=['A', 'B', 'D'])
new = [df.D.values[0]]
for i in range(1, len(df.index)):
    new.append(new[i-1]*df.A.values[i]+df.B.values[i])
df['C'] = new

Output

输出

      A  B  D    C
   0  1  1  1    1
   1  2  2  2    4
   2  3  3  3   15
   3  4  4  4   64
   4  5  5  5  325

Answer 3

回答by kztd

Given a column of numbers:

给定一列数字：

lst = []
cols = ['A']
for a in range(100, 105):
    lst.append([a])
df = pd.DataFrame(lst, columns=cols, index=range(5))
df

    A
0   100
1   101
2   102
3   103
4   104

You can reference the previous row with shift:

您可以使用 shift 引用上一行：

df['Change'] = df.A - df.A.shift(1)
df

    A   Change
0   100 NaN
1   101 1.0
2   102 1.0
3   103 1.0
4   104 1.0

Answer 4

回答by iipr

Although it has been a while since this question was asked, I will post my answer hoping it helps somebody.

虽然这个问题已经有一段时间了，但我会发布我的答案，希望对某人有所帮助。

Disclaimer:I know this solution is not standard, but I think it works well.

免责声明：我知道这个解决方案不是标准的，但我认为它运作良好。

import pandas as pd
import numpy as np

data = np.array([[10, 2, 10, 10],
                 [10, 3, 60, 100],
                 [np.nan] * 4,
                 [10, 22, 280, 250]]).T
idx = pd.date_range('20150131', end='20150203')
df = pd.DataFrame(data=data, columns=list('ABCD'), index=idx)
df
               A    B     C    D
 =================================
 2015-01-31    10   10    NaN  10
 2015-02-01    2    3     NaN  22 
 2015-02-02    10   60    NaN  280
 2015-02-03    10   100   NaN  250

def calculate(mul, add):
    global value
    value = value * mul + add
    return value

value = df.loc['2015-01-31', 'D']
df.loc['2015-01-31', 'C'] = value
df.loc['2015-02-01':, 'C'] = df.loc['2015-02-01':].apply(lambda row: calculate(*row[['A', 'B']]), axis=1)
df
               A    B     C     D
 =================================
 2015-01-31    10   10    10    10
 2015-02-01    2    3     23    22 
 2015-02-02    10   60    290   280
 2015-02-03    10   100   3000  250

So basically we use a applyfrom pandas and the help of a global variable that keeps track of the previous calculated value.

所以基本上我们使用apply来自熊猫的 a 和跟踪先前计算值的全局变量的帮助。

Time comparison with a forloop:

与for循环的时间比较：

data = np.random.random(size=(1000, 4))
idx = pd.date_range('20150131', end='20171026')
df = pd.DataFrame(data=data, columns=list('ABCD'), index=idx)
df.C = np.nan

df.loc['2015-01-31', 'C'] = df.loc['2015-01-31', 'D']

%%timeit
for i in df.loc['2015-02-01':].index.date:
    df.loc[i, 'C'] = df.loc[(i - pd.DateOffset(days=1)).date(), 'C'] * df.loc[i, 'A'] + df.loc[i, 'B']

3.2 s ± 114 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

每个循环 3.2 s ± 114 ms（7 次运行的平均值 ± 标准偏差，每次 1 次循环）

data = np.random.random(size=(1000, 4))
idx = pd.date_range('20150131', end='20171026')
df = pd.DataFrame(data=data, columns=list('ABCD'), index=idx)
df.C = np.nan

def calculate(mul, add):
    global value
    value = value * mul + add
    return value

value = df.loc['2015-01-31', 'D']
df.loc['2015-01-31', 'C'] = value

%%timeit
df.loc['2015-02-01':, 'C'] = df.loc['2015-02-01':].apply(lambda row: calculate(*row[['A', 'B']]), axis=1)

1.82 s ± 64.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

每个循环 1.82 秒 ± 64.4 毫秒（平均值 ± 标准偏差，7 次运行，每个循环 1 次）

So 0.57 times faster on average.

所以平均快 0.57 倍。

Answer 5

回答by jpp

`numba`

For recursive calculations which are not vectorisable, numba, which uses JIT-compilation and works with lower level objects, often yields large performance improvements. You need only define a regular forloop and use the decorator @njitor (for older versions) @jit(nopython=True):

对于不可矢量化的递归计算numba，使用 JIT 编译并使用较低级别对象的，通常会产生很大的性能改进。您只需要定义一个常规for循环并使用装饰器@njit或（对于旧版本）@jit(nopython=True)：

For a reasonable size dataframe, this gives a ~30x performance improvement versus a regular forloop:

对于合理大小的数据帧，与常规for循环相比，这可以提高约 30 倍的性能：

from numba import jit

@jit(nopython=True)
def calculator_nb(a, b, d):
    res = np.empty(d.shape)
    res[0] = d[0]
    for i in range(1, res.shape[0]):
        res[i] = res[i-1] * a[i] + b[i]
    return res

df['C'] = calculator_nb(*df[list('ABD')].values.T)

n = 10**5
df = pd.concat([df]*n, ignore_index=True)

# benchmarking on Python 3.6.0, Pandas 0.19.2, NumPy 1.11.3, Numba 0.30.1
# calculator() is same as calculator_nb() but without @jit decorator
%timeit calculator_nb(*df[list('ABD')].values.T)  # 14.1 ms per loop
%timeit calculator(*df[list('ABD')].values.T)     # 444 ms per loop

Python 当应用中也计算了先前的值时，Pandas 是否有办法在 dataframe.apply 中使用先前的行值？

提问by ctrl-alt-delete

采纳答案by Stefan

回答by Stefan

回答by kztd

回答by iipr

回答by jpp

`numba`

`numba`

相关推荐

最近更新

标签

Python 当应用中也计算了先前的值时，Pandas 是否有办法在 dataframe.apply 中使用先前的行值？

提问by ctrl-alt-delete

采纳答案by Stefan

回答by Stefan

回答by kztd

回答by iipr

回答by jpp

numba

numba

相关推荐

Python 如何在 IQR 中使用 Pandas 过滤器？

Python 使用flask执行hello world“ImportError：没有名为flask的模块”

如何使用 Python 中的条目小部件显示输出？

Python 如何在熊猫数据框中将单元格设置为 NaN

相关推荐

最近更新

标签

`numba`

`numba`