pandas 根据其他两列的相等性创建一个新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44067524/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating a new column depending on the equality of two other columns
提问by vincent75
l want to compare the values of two columns where I create a new column bin_crnn
. I want 1 if they are equals or 0 if not.
我想比较创建新列的两列的值bin_crnn
。如果它们相等,我想要 1,否则我想要 0。
# coding: utf-8
import pandas as pd
df = pd.read_csv('file.csv',sep=',')
if df['crnn_pred']==df['manual_raw_value']:
df['bin_crnn']=1
else:
df['bin_crnn']=0
l got the following error
我收到以下错误
if df['crnn_pred']==df['manual_raw_value']:
File "/home/ahmed/anaconda3/envs/cv/lib/python2.7/site-packages/pandas/core/generic.py", line 917, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
回答by Allen
One fast approach is to use np.where.
一种快速的方法是使用 np.where。
import numpy as np
df['test'] = np.where(df['crnn_pred']==df['manual_raw_value'], 1, 0)
回答by jezrael
You need cast boolean mask to int
with astype
:
您需要将布尔掩码转换为int
with astype
:
df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)
Sample:
样本:
df = pd.DataFrame({'crnn_pred':[1,2,5], 'manual_raw_value':[1,8,5]})
print (df)
crnn_pred manual_raw_value
0 1 1
1 2 8
2 5 5
print (df['crnn_pred']==df['manual_raw_value'])
0 True
1 False
2 True
dtype: bool
df['bin_crnn'] = (df['crnn_pred']==df['manual_raw_value']).astype(int)
print (df)
crnn_pred manual_raw_value bin_crnn
0 1 1 1
1 2 8 0
2 5 5 1
You get error, because if compare columns output is not scalar, but Series
(array
) of True
and False
values.
您会收到错误消息,因为如果比较列的输出不是标量,而是Series
( array
) 的True
和False
值。
So need all
or
any
for return scalar True
or False
.
所以需要all
or
any
返回标量True
or False
。
I think better it explain this answer.
我认为更好地解释这个答案。
回答by elPastor
No need for a loop or if statement, just need to set a new column using a boolean mask.
不需要循环或 if 语句,只需要使用布尔掩码设置一个新列。
df['bin_crnn'].loc[df['crnn_pred']==df['manual_raw_value']] = 1
df['bin_crnn'].fillna(0, inplace = True)
回答by Michael Discenza
Another quick way just using Pandas and not Numpy is
仅使用 Pandas 而不是 Numpy 的另一种快速方法是
df['columns_are_equal'] = df.apply(lambda x: int(x['column_a'] ==x['column_b']), axis=1)
回答by Ika8
You are comparing 2 columns, try this..
您正在比较 2 列,试试这个..
bin_crnn = []
for index, row in df.iterrows():
if row['crnn_pred'] == row['manual_raw_value']:
bin_crnn.append(1)
else:
bin_crnn.append(0)
df['bin_crnn'] = bin_crnn