Python 如果在熊猫数据框中的其他功能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43391591/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:59:51  来源:igfitidea点击:

if else function in pandas dataframe

pythonpandasif-statementdataframe

提问by progster

I'm trying to apply an if condition over a dataframe, but I'm missing something (error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().)

我正在尝试在数据帧上应用 if 条件,但我遗漏了一些东西(错误:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a。 any() 或 a.all().)

raw_data = {'age1': [23,45,21],'age2': [10,20,50]}
df = pd.DataFrame(raw_data, columns = ['age1','age2'])

def my_fun (var1,var2,var3):
if (df[var1]-df[var2])>0 :
    df[var3]=df[var1]-df[var2]
else:
    df[var3]=0
print(df[var3])

my_fun('age1','age2','diff')

回答by jezrael

You can use numpy.where:

您可以使用numpy.where

def my_fun (var1,var2,var3):
    df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
    return df

df1 = my_fun('age1','age2','diff')
print (df1)
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Error is better explain here.

错误最好在这里解释。

Slowier solution with apply, where need axis=1for data processing by rows:

较慢的解决方案apply,需要axis=1按行处理数据:

def my_fun(x, var1, var2, var3):
    print (x)
    if (x[var1]-x[var2])>0 :
        x[var3]=x[var1]-x[var2]
    else:
        x[var3]=0
    return x    

print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0

Also is possible use loc, but sometimes data can be overwritten:

也可以使用loc,但有时数据会被覆盖:

def my_fun(x, var1, var2, var3):
    print (x)
    mask = (x[var1]-x[var2])>0
    x.loc[mask, var3] = x[var1]-x[var2]
    x.loc[~mask, var3] = 0

    return x    

print (my_fun(df, 'age1', 'age2','diff'))
   age1  age2  diff
0    23    10  13.0
1    45    20  25.0
2    21    50   0.0

回答by piRSquared

You can use pandas.Series.where

您可以使用 pandas.Series.where

df.assign(age3=(df.age1 - df.age2).where(df.age1 > df.age2, 0))

   age1  age2  age3
0    23    10    13
1    45    20    25
2    21    50     0


You can wrap this in a function

你可以把它包装在一个函数中

def my_fun(v1, v2):
    return v1.sub(v2).where(v1 > v2, 0)

df.assign(age3=my_fun(df.age1, df.age2))

   age1  age2  age3
0    23    10    13
1    45    20    25
2    21    50     0

回答by cardamom

There is another way without np.whereor pd.Series.where. Am not saying it is better, but after trying to adapt this solution to a challenging problem today, was finding the syntax for whereno so intuitive. In the end, not sure whether it would have possible with where, but found the following method lets you have a look at the subset before you modify it and it for me led more quickly to a solution. Works for the OP here of course as well.

还有另一种没有np.where或 的方法pd.Series.where。我并不是说它更好,但是在尝试将此解决方案应用于今天的一个具有挑战性的问题之后,发现语法where不那么直观。最后,不确定是否可以使用 where,但发现以下方法可以让您在修改之前查看子集,并且它对我来说更快地找到了解决方案。当然也适用于这里的 OP。

You deliberately set a value on a slice of a dataframe as Pandas so often warns you not to.

你故意在数据帧的一个切片上设置一个值,因为 Pandas 经常警告你不要这样做。

Thisanswer shows you the correct method to do that.

这个答案向您展示了正确的方法来做到这一点。

The following gives you a slice:

下面给你一个切片:

df.loc[df['age1'] - df['age2'] > 0]

..which looks like:

..看起来像:

   age1  age2
0    23    10
1    45    20

Add an extra column to the original dataframe for the values you want to remain after modifying the slice:

为修改切片后要保留的值向原始数据帧添加额外的列:

df['diff'] = 0

Now modify the slice:

现在修改切片:

df.loc[df['age1'] - df['age2'] > 0, 'diff'] = df['age1'] - df['age2']

..and the result:

..结果:

   age1  age2  diff
0    23    10    13
1    45    20    25
2    21    50     0