pandas 比较数据框中的两列值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40479968/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
compare two columns value in dataframe
提问by Kun OuYang
I have a csv data frame like below, I'd like to compare two column value and generate third column, if value is same will return True
, not same return False
, how to compare with pandas python?
我有一个如下所示的csv数据框,我想比较两列值并生成第三列,如果值相同将返回True
,返回不相同False
,如何与pandas python进行比较?
one two
1 a
2 b
3 a
4 b
5 5
6 6
7 7
8 8
9 9
10 10
回答by jezrael
You need if values are mixed (string
and int
):
如果值混合(string
和int
),您需要:
df['three'] = df.one == df.two
But need to_numeric
if values are not mixed - dtype
of first column is int
and second is object
what is obviously string
and in column one
are not NaN
values, because to_numeric
with parameter errors='coerce'
return NaN
for non numeric values:
但是需要to_numeric
如果值不混合 -dtype
第一列是int
,第二列object
显然是什么string
,列one
中不是NaN
值,因为to_numeric
参数errors='coerce'
返回NaN
非数值:
print (pd.to_numeric(df.two, errors='coerce'))
0 NaN
1 NaN
2 NaN
3 NaN
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
Name: two, dtype: float64
df['three'] = df.one == pd.to_numeric(df.two, errors='coerce')
print (df)
one two three
0 1 a False
1 2 b False
2 3 a False
3 4 b False
4 5 5 True
5 6 6 True
6 7 7 True
7 8 8 True
8 9 9 True
9 10 10 True
Faster solution with Series.eq
:
更快的解决方案Series.eq
:
df['three'] = df.one.eq(pd.to_numeric(df.two, errors='coerce'))
print (df)
one two three
0 1 a False
1 2 b False
2 3 a False
3 4 b False
4 5 5 True
5 6 6 True
6 7 7 True
7 8 8 True
8 9 9 True
9 10 10 True