Python：numpy/pandas 根据条件更改值

Question

提问by tryptofame

I would like to know if there is a faster and more "pythonic" way of doing the following, e.g. using some built in methods. Given a pandas DataFrame or numpy array of floats, if the value is equal or smaller than 0.5 I need to calculate the reciprocal value and multiply with -1 and replace the old value with the newly calculated one. "Transform" is probably a bad choice of words, please tell me if you have a better/more accurate description.

我想知道是否有更快、更“pythonic”的方式来执行以下操作，例如使用一些内置方法。给定一个 Pandas DataFrame 或 numpy 浮点数组，如果该值等于或小于 0.5，我需要计算倒数并乘以 -1，然后用新计算的值替换旧值。“转换”可能是一个不好的词选择，如果您有更好/更准确的描述，请告诉我。

Thank you for your help and support!!

感谢您的帮助和支持！！

Data:

数据：

import numpy as np
import pandas as pd
dicti = {"A" : np.arange(0.0, 3, 0.1), 
         "B" : np.arange(0, 30, 1),
         "C" : list("ELVISLIVES")*3}
df = pd.DataFrame(dicti)

my function:

我的功能：

def transform_colname(df, colname):
    series = df[colname]    
    newval_list = []
    for val in series:
        if val <= 0.5:
            newval = (1/val)*-1
            newval_list.append(newval)
        else:
            newval_list.append(val)
    df[colname] = newval_list
    return df

function call:

函数调用：

transform_colname(df, colname="A")

**--> I'm summing up the results here, since comments wouldn't allow to post code (or I don't know how to do it).**

**--> 我在这里总结了结果，因为评论不允许发布代码（或者我不知道该怎么做）。**

Thank you all for your fast and great answers!!

感谢大家的快速和伟大的答案！

using ipython "%timeit" with "real" data:

使用带有“真实”数据的 ipython“%timeit”：

my function:10 loops, best of 3: 24.1 ms per loop

我的功能：10 个循环，最好的 3 个：每个循环 24.1 毫秒

from jojo:

来自乔乔：

def transform_colname_v2(df, colname):
    series = df[colname]        
    df[colname] = np.where(series <= 0.5, 1/series*-1, series)
    return df

100 loops, best of 3: 2.76 ms per loop

100 个循环，最好的 3 个：每个循环 2.76 毫秒

from FooBar:

来自 FooBar：

def transform_colname_v3(df, colname):
    df.loc[df[colname] <= 0.5, colname]  = - 1 / df[colname][df[colname] <= 0.5]
    return df

100 loops, best of 3: 3.32 ms per loop

100 个循环，最好的 3 个：每个循环 3.32 毫秒

from dmvianna:

来自 dmvianna：

def transform_colname_v4(df, colname):
    df[colname] = df[colname].where(df[colname] <= 0.5, (1/df[colname])*-1)
    return df

100 loops, best of 3: 3.7 ms per loop

100 个循环，最好的 3 个：每个循环 3.7 毫秒

Please tell/show me if you would implement your code in a different way!

请告诉/告诉我您是否会以不同的方式实现您的代码！

One final QUESTION: (answered) How could "FooBar" and "dmvianna" 's versions be made "generic"? I mean, I had to write the name of the column into the function (since using it as a variable didn't work). Please explain this last point! --> thanks jojo, ".loc" isn't the right way, but very simple df[colname] is sufficient. changed the functions above to be more "generic". (also changed ">" to be "<=", and updated timing)

最后一个问题：（已回答）“FooBar”和“dmvianna”的版本如何成为“通用”版本？我的意思是，我必须将列的名称写入函数（因为将其用作变量不起作用）。请解释最后一点！--> 感谢 jojo，".loc" 不是正确的方法，但是非常简单的 df[colname] 就足够了。将上述功能更改为更“通用”。（也将“>”更改为“<=”，并更新时间）

Thank you very much!!

非常感谢！！

Answer 1

回答by FooBar

The typical trick is to write a general mathematical operation to apply to the whole column, but then use indicators to select rows for which we actually apply it:

典型的技巧是编写一个通用的数学运算来应用于整个列，然后使用指标来选择我们实际应用它的行：

df.loc[df.A < 0.5, 'A']  = - 1 / df.A[df.A < 0.5] 

In[13]: df
Out[13]: 
            A   B  C
0        -inf   0  E
1  -10.000000   1  L
2   -5.000000   2  V
3   -3.333333   3  I
4   -2.500000   4  S
5    0.500000   5  L
6    0.600000   6  I
7    0.700000   7  V
8    0.800000   8  E
9    0.900000   9  S
10   1.000000  10  E
11   1.100000  11  L
12   1.200000  12  V
13   1.300000  13  I
14   1.400000  14  S
15   1.500000  15  L
16   1.600000  16  I
17   1.700000  17  V
18   1.800000  18  E
19   1.900000  19  S
20   2.000000  20  E
21   2.100000  21  L
22   2.200000  22  V
23   2.300000  23  I
24   2.400000  24  S
25   2.500000  25  L
26   2.600000  26  I
27   2.700000  27  V
28   2.800000  28  E
29   2.900000  29  S

Answer 2

回答by jojo

If we are talking about arrays:

如果我们在谈论数组：

import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print 1 / a[a <= 0.5] * (-1)

This will, however only return the values smaller than 0.5.

然而，这只会返回小于的值0.5。

Alternatively use np.where:

或者使用np.where：

import numpy as np
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6], dtype=np.float)
print np.where(a < 0.5, 1 / a * (-1), a)

Talking about pandasDataFrame:

谈数据pandas帧：

As in @dmvianna's answer (so give some credit to him ;) ), adapting it to pd.DataFrame:

正如@dmvianna的回答（所以给他一些功劳；）），将其调整为pd.DataFrame：

df.a = df.a.where(df.a > 0.5, (1 / df.a) * (-1))

Answer 3

回答by dmvianna

As in @jojo's answer, but using pandas:

正如@jojo的回答，但使用Pandas：

df.A = df.A.where(df.A > 0.5, (1/df.A)*-1)

or

或者

df.A.where(df.A > 0.5, (1/df.A)*-1, inplace=True) # this should be faster

.where docstring:

.where 文档字符串：

Definition: df.A.where(self, cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
Docstring: Return an object of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

定义：df.A.where(self, cond, other=nan, inplace=False, axis=None, level=None, try_cast=False, raise_on_error=True)
Docstring：返回一个与 self 形状相同的对象，其对应的条目来自 self，其中 cond 为 True，否则来自 other。

Python：numpy/pandas 根据条件更改值

提问by tryptofame

回答by FooBar

回答by jojo

回答by dmvianna

相关推荐

最近更新

标签

Python：numpy/pandas 根据条件更改值

提问by tryptofame

回答by FooBar

回答by jojo

回答by dmvianna

相关推荐

返回将 Pandas 数据帧作为参数的函数的输出

根据列中的最大值过滤 Pandas Dataframe

R 的 Pandas 等价物 which()

如何在 Pandas 中合并两个数据框以替换 nan

相关推荐

最近更新

标签