Python:numpy/pandas 根据条件更改值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25326649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: numpy/pandas change values on condition
提问by tryptofame
I would like to know if there is a faster and more "pythonic" way of doing the following, e.g. using some built in methods. Given a pandas DataFrame or numpy array of floats, if the value is equal or smaller than 0.5 I need to calculate the reciprocal value and multiply with -1 and replace the old value with the newly calculated one. "Transform" is probably a bad choice of words, please tell me if you have a better/more accurate description.
我想知道是否有更快、更“pythonic”的方式来执行以下操作,例如使用一些内置方法。给定一个 Pandas DataFrame 或 numpy 浮点数组,如果该值等于或小于 0.5,我需要计算倒数并乘以 -1,然后用新计算的值替换旧值。“转换”可能是一个不好的词选择,如果您有更好/更准确的描述,请告诉我。
Thank you for your help and support!!
感谢您的帮助和支持!!
Data:
数据:
import numpy as np
import pandas as pd
dicti = {"A" : np.arange(0.0, 3, 0.1),
"B" : np.arange(0, 30, 1),
"C" : list("ELVISLIVES")*3}
df = pd.DataFrame(dicti)
my function:
我的功能:
def transform_colname(df, colname):
series = df[colname]
newval_list = []
for val in series:
if val <= 0.5:
newval = (1/val)*-1
newval_list.append(newval)
else:
newval_list.append(val)
df[colname] = newval_list
return df
function call:
函数调用:
transform_colname(df, colname="A")
**--> I'm summing up the results here, since comments wouldn't allow to post code (or I don't know how to do it).**
**--> 我在这里总结了结果,因为评论不允许发布代码(或者我不知道该怎么做)。**
Thank you all for your fast and great answers!!
感谢大家的快速和伟大的答案!
using ipython "%timeit" with "real" data:
使用带有“真实”数据的 ipython“%timeit”:
my function:10 loops, best of 3: 24.1 ms per loop
我的功能:10 个循环,最好的 3 个:每个循环 24.1 毫秒
from jojo:
来自乔乔:
def transform_colname_v2(df, colname):
series = df[colname]
df[colname] = np.where(series <= 0.5, 1/series*-1, series)
return df
100 loops, best of 3: 2.76 ms per loop
100 个循环,最好的 3 个:每个循环 2.76 毫秒
from FooBar:
来自 FooBar:
def transform_colname_v3(df, colname):
df.loc[df[colname] <= 0.5, colname] = - 1 / df[colname][df[colname] <= 0.5]
return df
100 loops, best of 3: 3.32 ms per loop
100 个循环,最好的 3 个:每个循环 3.32 毫秒
from dmvianna:
来自 dmvianna:
def transform_colname_v4(df, colname):
df[colname] = df[colname].where(df[colname] <= 0.5, (1/df[colname])*-1)
return df
100 loops, best of 3: 3.7 ms per loop
100 个循环,最好的 3 个:每个循环 3.7 毫秒
Please tell/show me if you would implement your code in a different way!
请告诉/告诉我您是否会以不同的方式实现您的代码!
One final QUESTION: (answered) How could "FooBar" and "dmvianna" 's versions be made "generic"? I mean, I had to write the name of the column into the function (since using it as a variable didn't work). Please explain this last point! --> thanks jojo, ".loc" isn't the right way, but very simple df[colname] is sufficient. changed the functions above to be more "generic". (also changed ">" to be "<=", and updated timing)
最后一个问题:(已回答)“FooBar”和“dmvianna”的版本如何成为“通用”版本?我的意思是,我必须将列的名称写入函数(因为将其用作变量不起作用)。请解释最后一点!--> 感谢 jojo,".loc" 不是正确的方法,但是非常简单的 df[colname] 就足够了。将上述功能更改为更“通用”。(也将“>”更改为“<=”,并更新时间)
Thank you very much!!
非常感谢!!
回答by FooBar
The typical trick is to write a general mathematical operation to apply to the whole column, but then use indicators to select rows for which we actually apply it:
典型的技巧是编写一个通用的数学运算来应用于整个列,然后使用指标来选择我们实际应用它的行:
df.loc[df.A < 0.5, 'A'] = - 1 / df.A[df.A < 0.5]
In[13]: df
Out[13]:
A B C
0 -inf 0 E
1 -10.000000 1 L
2 -5.000000 2 V
3 -3.333333 3 I
4 -2.500000 4 S
5 0.500000 5 L
6 0.600000 6 I
7 0.700000 7 V
8 0.800000 8 E
9 0.900000 9 S
10 1.000000 10 E
11 1.100000 11 L
12 1.200000 12 V
13 1.300000 13 I
14 1.400000 14 S
15 1.500000 15 L
16 1.600000 16 I
17 1.700000 17 V
18 1.800000 18 E
19 1.900000 19 S
20 2.000000 20 E
21 2.100000 21 L
22 2.200000 22 V
23 2.300000 23 I
24 2.400000 24 S
25 2.500000 25 L
26 2.600000 26 I
27 2.700000 27 V
28 2.800000 28 E
29 2.900000 29 S
回答by jojo
If we are talking about arrays:
如果我们在谈论数组:
import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print 1 / a[a <= 0.5] * (-1)
This will, however only return the values smaller than 0.5.
然而,这只会返回小于 的值0.5。
Alternatively use np.where:
或者使用np.where:
import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print np.where(a < 0.5, 1 / a * (-1), a)
Talking about pandasDataFrame:
谈数据pandas帧:
As in @dmvianna's answer (so give some credit to him ;) ), adapting it to pd.DataFrame:
正如@dmvianna的回答(所以给他一些功劳;)),将其调整为pd.DataFrame:
df.a = df.a.where(df.a > 0.5, (1 / df.a) * (-1))
回答by dmvianna
As in @jojo's answer, but using pandas:
正如@jojo的回答,但使用Pandas:
df.A = df.A.where(df.A > 0.5, (1/df.A)*-1)
or
或者
df.A.where(df.A > 0.5, (1/df.A)*-1, inplace=True) # this should be faster
.where docstring:
.where 文档字符串:
Definition: df.A.where(self, cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
Docstring: Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.
定义:df.A.where(self, cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
Docstring:返回一个与 self 形状相同的对象,其对应的条目来自 self,其中 cond 为 True,否则来自 other。

