pandas 根据其他两列的相等性创建一个新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44067524/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:38:06  来源:igfitidea点击:

Creating a new column depending on the equality of two other columns

pythonpandasdataframe

提问by vincent75

l want to compare the values of two columns where I create a new column bin_crnn. I want 1 if they are equals or 0 if not.

我想比较创建新列的两列的值bin_crnn。如果它们相等,我想要 1,否则我想要 0。

# coding: utf-8
import pandas as pd

df = pd.read_csv('file.csv',sep=',')

if df['crnn_pred']==df['manual_raw_value']:
    df['bin_crnn']=1
else:
    df['bin_crnn']=0

l got the following error

我收到以下错误

    if df['crnn_pred']==df['manual_raw_value']:
  File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/core/generic.py", line 917, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

回答by Allen

One fast approach is to use np.where.

一种快速的方法是使用 np.where。

import numpy as np
df['test'] = np.where(df['crnn_pred']==df['manual_raw_value'], 1, 0)

回答by jezrael

You need cast boolean mask to intwith astype:

您需要将布尔掩码转换为intwith astype

df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)

Sample:

样本:

df = pd.DataFrame({'crnn_pred':[1,2,5], 'manual_raw_value':[1,8,5]})
print (df)
   crnn_pred  manual_raw_value
0          1                 1
1          2                 8
2          5                 5

print (df['crnn_pred']==df['manual_raw_value'])
0     True
1    False
2     True
dtype: bool

df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)
print (df)
   crnn_pred  manual_raw_value  bin_crnn
0          1                 1         1
1          2                 8         0
2          5                 5         1

You get error, because if compare columns output is not scalar, but Series(array) of Trueand Falsevalues.

您会收到错误消息,因为如果比较列的输出不是标量,而是Series( array) 的TrueFalse值。

So need allor anyfor return scalar Trueor False.

所以需要allor any返回标量Trueor False

I think better it explain this answer.

我认为更好地解释这个答案

回答by elPastor

No need for a loop or if statement, just need to set a new column using a boolean mask.

不需要循环或 if 语句,只需要使用布尔掩码设置一个新列。

df['bin_crnn'].loc[df['crnn_pred']==df['manual_raw_value']] = 1
df['bin_crnn'].fillna(0, inplace = True) 

回答by Michael Discenza

Another quick way just using Pandas and not Numpy is

仅使用 Pandas 而不是 Numpy 的另一种快速方法是

df['columns_are_equal'] = df.apply(lambda x: int(x['column_a'] ==x['column_b']), axis=1)

回答by Ika8

You are comparing 2 columns, try this..

您正在比较 2 列,试试这个..

bin_crnn = []
for index, row in df.iterrows():
    if row['crnn_pred'] == row['manual_raw_value']:
        bin_crnn.append(1)
    else:
        bin_crnn.append(0)
df['bin_crnn'] = bin_crnn