Python Pandas DataFrame 针对复杂的“if”条件使用前一行值来确定当前值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36923494/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame use previous row value for complicated 'if' conditions to determine current value
提问by user5025141
I want to know if there is any faster way to do the following loop? Maybe use apply or rolling apply function to realize this Basically, I need to access previous row's value to determine current cell value.
我想知道是否有更快的方法来执行以下循环?也许使用应用或滚动应用功能来实现这一点基本上,我需要访问前一行的值来确定当前单元格值。
df.ix[0] = (np.abs(df.ix[0]) >= So) * np.sign(df.ix[0])
for i in range(1, len(df)):
for col in list(df.columns.values):
if ((df[col].ix[i] > 1.25) & (df[col].ix[i-1] == 0)) | :
df[col].ix[i] = 1
elif ((df[col].ix[i] < -1.25) & (df[col].ix[i-1] == 0)):
df[col].ix[i] = -1
elif ((df[col].ix[i] <= -0.75) & (df[col].ix[i-1] < 0)) | ((df[col].ix[i] >= 0.5) & (df[col].ix[i-1] > 0)):
df[col].ix[i] = df[col].ix[i-1]
else:
df[col].ix[i] = 0
As you can see, in the function, I am updating the dataframe, I need to access the most updated previous row, so using shift will not work.
如您所见,在函数中,我正在更新数据帧,我需要访问最新更新的前一行,因此使用 shift 不起作用。
For example: Input:
例如: 输入:
A B C
1.3 -1.5 0.7
1.1 -1.4 0.6
1.0 -1.3 0.5
0.4 1.4 0.4
Output:
输出:
A B C
1 -1 0
1 -1 0
1 -1 0
0 1 0
回答by MaxU
you can use .shift()function for accessing previousor nextvalues:
您可以使用.shift()函数来访问上一个或下一个值:
previous value for col
column:
col
列的先前值:
df['col'].shift()
next value for col
column:
col
列的下一个值:
df['col'].shift(-1)
Example:
例子:
In [38]: df
Out[38]:
a b c
0 1 0 5
1 9 9 2
2 2 2 8
3 6 3 0
4 6 1 7
In [39]: df['prev_a'] = df['a'].shift()
In [40]: df
Out[40]:
a b c prev_a
0 1 0 5 NaN
1 9 9 2 1.0
2 2 2 8 9.0
3 6 3 0 2.0
4 6 1 7 6.0
In [43]: df['next_a'] = df['a'].shift(-1)
In [44]: df
Out[44]:
a b c prev_a next_a
0 1 0 5 NaN 9.0
1 9 9 2 1.0 2.0
2 2 2 8 9.0 6.0
3 6 3 0 2.0 6.0
4 6 1 7 6.0 NaN
回答by CoreDump
I am surprised there isn't a native pandas solution to this as well, because shift and rolling do not get it done. I have devised a way to do this using the standard pandas syntax but I am not sure if it performs any better than your loop... My purposes just required this for consistency (not speed).
我很惊讶也没有本地 Pandas 解决方案,因为移位和滚动没有完成。我已经设计了一种使用标准熊猫语法来做到这一点的方法,但我不确定它的性能是否比你的循环更好......我的目的只是为了一致性(而不是速度)。
import pandas as pd
df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})
new_col = 'c'
def apply_func_decorator(func):
prev_row = {}
def wrapper(curr_row, **kwargs):
val = func(curr_row, prev_row)
prev_row.update(curr_row)
prev_row[new_col] = val
return val
return wrapper
@apply_func_decorator
def running_total(curr_row, prev_row):
return curr_row['a'] + curr_row['b'] + prev_row.get('c', 0)
df[new_col] = df.apply(running_total, axis=1)
print(df)
# Output will be:
# a b c
# 0 0 0 0
# 1 1 10 11
# 2 2 20 33
Disclaimer: I used pandas 0.16 but with only slight modification this will work for the latest versions too.
免责声明:我使用了 Pandas 0.16,但只需稍作修改,这也适用于最新版本。
Others had similar questions and I posted this solution on those as well:
其他人也有类似的问题,我也发布了这个解决方案:
回答by flyingmeatball
@maxU has it right with shift, I think you can even compare dataframes directly, something like this:
@maxU 对 shift 是正确的,我认为您甚至可以直接比较数据帧,如下所示:
df_prev = df.shift(-1)
df_out = pd.DataFrame(index=df.index,columns=df.columns)
df_out[(df>1.25) & (df_prev == 0)] = 1
df_out[(df<-1.25) & (df_prev == 0)] = 1
df_out[(df<-.75) & (df_prev <0)] = df_prev
df_out[(df>.5) & (df_prev >0)] = df_prev
The syntax may be off, but if you provide some test data I think this could work.
语法可能已关闭,但如果您提供一些测试数据,我认为这可以工作。
Saves you having to loop at all.
省去了你必须循环。
EDIT - Update based on comment below
编辑 - 根据下面的评论更新
I would try my absolute best not to loop through the DF itself. You're better off going column by column, sending to a list and doing the updating, then just importing back again. Something like this:
我会尽我最大的努力不遍历 DF 本身。您最好逐列进行,发送到列表并进行更新,然后再次导入。像这样的东西:
df.ix[0] = (np.abs(df.ix[0]) >= 1.25) * np.sign(df.ix[0])
for col in df.columns.tolist():
currData = df[col].tolist()
for currRow in range(1,len(currData)):
if currData[currRow]> 1.25 and currData[currRow-1]== 0:
currData[currRow] = 1
elif currData[currRow] < -1.25 and currData[currRow-1]== 0:
currData[currRow] = -1
elif currData[currRow] <=-.75 and currData[currRow-1]< 0:
currData[currRow] = currData[currRow-1]
elif currData[currRow]>= .5 and currData[currRow-1]> 0:
currData[currRow] = currData[currRow-1]
else:
currData[currRow] = 0
df[col] = currData