Python Pandas DataFrame 针对复杂的“if”条件使用前一行值来确定当前值

Question

提问by user5025141

I want to know if there is any faster way to do the following loop? Maybe use apply or rolling apply function to realize this Basically, I need to access previous row's value to determine current cell value.

我想知道是否有更快的方法来执行以下循环？也许使用应用或滚动应用功能来实现这一点基本上，我需要访问前一行的值来确定当前单元格值。

df.ix[0] = (np.abs(df.ix[0]) >= So) * np.sign(df.ix[0])
for i in range(1, len(df)):
    for col in list(df.columns.values):
        if ((df[col].ix[i] > 1.25) & (df[col].ix[i-1] == 0)) | :
            df[col].ix[i] = 1
        elif ((df[col].ix[i] < -1.25) & (df[col].ix[i-1] == 0)):
            df[col].ix[i] = -1
        elif ((df[col].ix[i] <= -0.75) & (df[col].ix[i-1] < 0)) | ((df[col].ix[i] >= 0.5) & (df[col].ix[i-1] > 0)):
            df[col].ix[i] = df[col].ix[i-1]
        else:
            df[col].ix[i] = 0

As you can see, in the function, I am updating the dataframe, I need to access the most updated previous row, so using shift will not work.

如您所见，在函数中，我正在更新数据帧，我需要访问最新更新的前一行，因此使用 shift 不起作用。

For example: Input:

例如：输入：

A      B     C
1.3  -1.5   0.7
1.1  -1.4   0.6
1.0  -1.3   0.5
0.4   1.4   0.4

Output:

输出：

 A      B     C
1     -1      0
1     -1      0
1     -1      0
0      1      0

Answer 1

回答by MaxU

you can use .shift()function for accessing previousor nextvalues:

您可以使用.shift()函数来访问上一个或下一个值：

previous value for colcolumn:

col列的先前值：

df['col'].shift()

next value for colcolumn:

col列的下一个值：

df['col'].shift(-1)

Example:

例子：

In [38]: df
Out[38]:
   a  b  c
0  1  0  5
1  9  9  2
2  2  2  8
3  6  3  0
4  6  1  7

In [39]: df['prev_a'] = df['a'].shift()

In [40]: df
Out[40]:
   a  b  c  prev_a
0  1  0  5     NaN
1  9  9  2     1.0
2  2  2  8     9.0
3  6  3  0     2.0
4  6  1  7     6.0

In [43]: df['next_a'] = df['a'].shift(-1)

In [44]: df
Out[44]:
   a  b  c  prev_a  next_a
0  1  0  5     NaN     9.0
1  9  9  2     1.0     2.0
2  2  2  8     9.0     6.0
3  6  3  0     2.0     6.0
4  6  1  7     6.0     NaN

Answer 2

回答by CoreDump

I am surprised there isn't a native pandas solution to this as well, because shift and rolling do not get it done. I have devised a way to do this using the standard pandas syntax but I am not sure if it performs any better than your loop... My purposes just required this for consistency (not speed).

我很惊讶也没有本地 Pandas 解决方案，因为移位和滚动没有完成。我已经设计了一种使用标准熊猫语法来做到这一点的方法，但我不确定它的性能是否比你的循环更好......我的目的只是为了一致性（而不是速度）。

import pandas as pd

df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})

new_col = 'c'

def apply_func_decorator(func):
    prev_row = {}
    def wrapper(curr_row, **kwargs):
        val = func(curr_row, prev_row)
        prev_row.update(curr_row)
        prev_row[new_col] = val
        return val
    return wrapper

@apply_func_decorator
def running_total(curr_row, prev_row):
    return curr_row['a'] + curr_row['b'] + prev_row.get('c', 0)

df[new_col] = df.apply(running_total, axis=1)

print(df)
# Output will be:
#    a   b   c
# 0  0   0   0
# 1  1  10  11
# 2  2  20  33

Disclaimer: I used pandas 0.16 but with only slight modification this will work for the latest versions too.

免责声明：我使用了 Pandas 0.16，但只需稍作修改，这也适用于最新版本。

Others had similar questions and I posted this solution on those as well:

其他人也有类似的问题，我也发布了这个解决方案：

Answer 3

回答by flyingmeatball

@maxU has it right with shift, I think you can even compare dataframes directly, something like this:

@maxU 对 shift 是正确的，我认为您甚至可以直接比较数据帧，如下所示：

df_prev = df.shift(-1)
df_out = pd.DataFrame(index=df.index,columns=df.columns)

df_out[(df>1.25) & (df_prev == 0)] = 1
df_out[(df<-1.25) & (df_prev == 0)] = 1
df_out[(df<-.75) & (df_prev <0)] = df_prev
df_out[(df>.5) & (df_prev >0)] = df_prev

The syntax may be off, but if you provide some test data I think this could work.

语法可能已关闭，但如果您提供一些测试数据，我认为这可以工作。

Saves you having to loop at all.

省去了你必须循环。

EDIT - Update based on comment below

编辑 - 根据下面的评论更新

I would try my absolute best not to loop through the DF itself. You're better off going column by column, sending to a list and doing the updating, then just importing back again. Something like this:

我会尽我最大的努力不遍历 DF 本身。您最好逐列进行，发送到列表并进行更新，然后再次导入。像这样的东西：

df.ix[0] = (np.abs(df.ix[0]) >= 1.25) * np.sign(df.ix[0]) 
for col in df.columns.tolist():
    currData = df[col].tolist()
    for currRow in range(1,len(currData)):
        if  currData[currRow]> 1.25 and currData[currRow-1]== 0:
            currData[currRow] = 1
        elif currData[currRow] < -1.25 and currData[currRow-1]== 0:
            currData[currRow] = -1
        elif currData[currRow] <=-.75 and currData[currRow-1]< 0:
            currData[currRow] = currData[currRow-1]
        elif currData[currRow]>= .5 and currData[currRow-1]> 0:
            currData[currRow] = currData[currRow-1]
        else:
            currData[currRow] = 0
    df[col] = currData

Python Pandas DataFrame 针对复杂的“if”条件使用前一行值来确定当前值

提问by user5025141

回答by MaxU

回答by CoreDump

回答by flyingmeatball

相关推荐

最近更新

标签

Python Pandas DataFrame 针对复杂的“if”条件使用前一行值来确定当前值

提问by user5025141

回答by MaxU

回答by CoreDump

回答by flyingmeatball

相关推荐

如何使用 Python 将新列附加到 CSV 文件？

Python pip3：找不到命令

Python 在 Tensorflow 中，获取图中所有张量的名称

Python 多个分类变量之间的相关性（Pandas）

相关推荐

最近更新

标签