Python 在 Pandas 数据框中矢量化条件赋值

Question

提问by azuric

If I have a dataframe dfwith column xand want to create column ybased on values of xusing this in pseudo code:

如果我有一个df带有列的数据框，x并且想y根据x在伪代码中使用它的值创建列：

 if df['x'] <-2 then df['y'] = 1 
 else if df['x'] > 2 then df['y']= -1 
 else df['y'] = 0

How would I achieve this? I assume np.whereis the best way to do this but not sure how to code it correctly.

我将如何实现这一目标？我认为这np.where是最好的方法，但不确定如何正确编码。

Answer 1

采纳答案by EdChum

One simple method would be to assign the default value first and then perform 2 loccalls:

一种简单的方法是先分配默认值，然后执行 2 次loc调用：

In [66]:

df = pd.DataFrame({'x':[0,-3,5,-1,1]})
df
Out[66]:
   x
0  0
1 -3
2  5
3 -1
4  1

In [69]:

df['y'] = 0
df.loc[df['x'] < -2, 'y'] = 1
df.loc[df['x'] > 2, 'y'] = -1
df
Out[69]:
   x  y
0  0  0
1 -3  1
2  5 -1
3 -1  0
4  1  0

If you wanted to use np.wherethen you could do it with a nested np.where:

如果你想使用，np.where那么你可以使用嵌套的np.where：

In [77]:

df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))
df
Out[77]:
   x  y
0  0  0
1 -3  1
2  5 -1
3 -1  0
4  1  0

So here we define the first condition as where x is less than -2, return 1, then we have another np.wherewhich tests the other condition where x is greater than 2 and returns -1, otherwise return 0

所以这里我们将第一个条件定义为 x 小于 -2，返回 1，然后我们有另一个np.where测试 x 大于 2 的另一个条件并返回 -1，否则返回 0

timings

时间

In [79]:

%timeit df['y'] = np.where(df['x'] < -2 , 1, np.where(df['x'] > 2, -1, 0))

1000 loops, best of 3: 1.79 ms per loop

In [81]:

%%timeit
df['y'] = 0
df.loc[df['x'] < -2, 'y'] = 1
df.loc[df['x'] > 2, 'y'] = -1

100 loops, best of 3: 3.27 ms per loop

So for this sample dataset the np.wheremethod is twice as fast

所以对于这个示例数据集，该np.where方法的速度是原来的两倍

Answer 2

回答by Erfan

This is a good use case for pd.cutwhere you define ranges and based on those rangesyou can assign labels:

这是一个很好的用例，用于pd.cut定义范围并基于ranges您可以分配的范围labels：

df['y'] = pd.cut(df['x'], [-np.inf, -2, 2, np.inf], labels=[1, 0, -1], right=False)

Output

输出

Python 在 Pandas 数据框中矢量化条件赋值

提问by azuric

采纳答案by EdChum

回答by Erfan

相关推荐

最近更新

标签

Python 在 Pandas 数据框中矢量化条件赋值

提问by azuric

采纳答案by EdChum

回答by Erfan

相关推荐

使用 Python 的 JSON 数据中的空值，而不是无值

IPython Notebook - 提前退出单元格

Python Pandas GroupBy 获取组列表

Python 了解 sklearn 中 CountVectorizer 中的 `ngram_range` 参数

相关推荐

最近更新

标签