基于布尔条件的 Pandas 数据框中的新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49432081/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:21:44  来源:igfitidea点击:

New column in Pandas dataframe based on boolean conditions

pythonpandasdataframe

提问by TvdM

I'd like to create a new column to a Pandas dataframe populated with True or False based on the other values in each specific row. My approach to solve this task was to apply a function checking boolean conditions across each row in the dataframe and populate the new column with either True or False.

我想根据每个特定行中的其他值为填充了 True 或 False 的 Pandas 数据框创建一个新列。我解决此任务的方法是在数据框中的每一行中应用检查布尔条件的函数,并使用 True 或 False 填充新列。

This is the dataframe:

这是数据框:

l={'DayTime':['2018-03-01','2018-03-02','2018-03-03'],'Pressure':
[9,10.5,10.5], 'Feed':[9,10.5,11], 'Temp':[9,10.5,11]}

df1=pd.DataFrame(l)

This is the function I wrote:

这是我写的函数:

def ops_on(row):
   return row[('Feed' > 10)
              & ('Pressure' > 10)
              & ('Temp' > 10)
             ]

The function ops_on is used to create the new column ['ops_on']:

函数 ops_on 用于创建新列 ['ops_on']:

df1['ops_on'] = df1.apply(ops_on, axis='columns')

Unfortunately, I get this error message:

不幸的是,我收到此错误消息:

TypeError: ("'>' not supported between instances of 'str' and 'int'", 'occurred at index 0')

类型错误:(“str”和“int”的实例之间不支持“'>'”,'发生在索引 0')

Thankful for help.

感谢帮助。

回答by jpp

You should work column-wise (vectorised, efficient) rather than row-wise (inefficient, Python loop):

您应该按列(矢量化,高效)而不是按行(低效,Python 循环)工作:

df1['ops_on'] = (df1['Feed'] > 10) & (df1['Pressure'] > 10) & (df1['Temp'] > 10)

The &("and") operator is applied to Boolean series element-wise. An arbitrary number of such conditions can be chained.

&(“和”)运算符应用于布尔系列逐元素。可以链接任意数量的此类条件。



Alternatively, for the special case where you are performing the same comparison multiple times:

或者,对于您多次执行相同比较的特殊情况:

df1['ops_on'] = df1[['Feed', 'Pressure', 'Temp']].gt(10).all(1)

回答by YOLO

In your current setup, just re-write your function like this:

在您当前的设置中,只需像这样重写您的函数:

def ops_on(row):
    return (row['Feed'] > 10) & (row['Pressure'] > 10) & (row['Temp'] > 10)