Python 在 Pandas 数据框中矢量化条件赋值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28896769/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
vectorize conditional assignment in pandas dataframe
提问by azuric
If I have a dataframe dfwith column xand want to create column ybased on values of xusing this in pseudo code:
如果我有一个df带有列的数据框,x并且想y根据x在伪代码中使用它的值创建列:
if df['x'] <-2 then df['y'] = 1
else if df['x'] > 2 then df['y']= -1
else df['y'] = 0
How would I achieve this? I assume np.whereis the best way to do this but not sure how to code it correctly.
我将如何实现这一目标?我认为这np.where是最好的方法,但不确定如何正确编码。
采纳答案by EdChum
One simple method would be to assign the default value first and then perform 2 loccalls:
一种简单的方法是先分配默认值,然后执行 2 次loc调用:
In [66]:
df = pd.DataFrame({'x':[0,-3,5,-1,1]})
df
Out[66]:
x
0 0
1 -3
2 5
3 -1
4 1
In [69]:
df['y'] = 0
df.loc[df['x'] < -2, 'y'] = 1
df.loc[df['x'] > 2, 'y'] = -1
df
Out[69]:
x y
0 0 0
1 -3 1
2 5 -1
3 -1 0
4 1 0
If you wanted to use np.wherethen you could do it with a nested np.where:
如果你想使用,np.where那么你可以使用嵌套的np.where:
In [77]:
df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))
df
Out[77]:
x y
0 0 0
1 -3 1
2 5 -1
3 -1 0
4 1 0
So here we define the first condition as where x is less than -2, return 1, then we have another np.wherewhich tests the other condition where x is greater than 2 and returns -1, otherwise return 0
所以这里我们将第一个条件定义为 x 小于 -2,返回 1,然后我们有另一个np.where测试 x 大于 2 的另一个条件并返回 -1,否则返回 0
timings
时间
In [79]:
%timeit df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))
1000 loops, best of 3: 1.79 ms per loop
In [81]:
%%timeit
df['y'] = 0
df.loc[df['x'] < -2, 'y'] = 1
df.loc[df['x'] > 2, 'y'] = -1
100 loops, best of 3: 3.27 ms per loop
So for this sample dataset the np.wheremethod is twice as fast
所以对于这个示例数据集,该np.where方法的速度是原来的两倍
回答by Erfan
This is a good use case for pd.cutwhere you define ranges and based on those rangesyou can assign labels:
这是一个很好的用例,用于pd.cut定义范围并基于ranges您可以分配的范围labels:
df['y'] = pd.cut(df['x'], [-np.inf, -2, 2, np.inf], labels=[1, 0, -1], right=False)
Output
输出
x y
0 0 0
1 -3 1
2 5 -1
3 -1 0
4 1 0

