比较两个 Python Pandas 数据框的 2 列并获取公共行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30291032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Comparing 2 columns of two Python Pandas dataframes and getting the common rows
提问by Mohammad Saifullah
I have 2 Dataframe as follows:
我有 2 个数据框,如下所示:
DF1=
A B C D
0 AA BA KK 0
1 AD BD LL 0
2 AF BF MM 0
DF2=
K L
0 AA BA
1 AD BF
2 AF BF
At the end what I want to get is:
最后我想得到的是:
DF1=
A B C D
0 AA BA KK 1
1 AD BD LL 0
2 AF BF MM 1
So, I want to compare two dataframe, I want to see which rows of first data frame (for column A and B) are in common of of second dataframe(Column K and L) and assign 1 on the coulmn D of first dataframe.
所以,我想比较两个数据帧,我想看看第一个数据帧(对于 A 列和 B 列)的哪些行与第二个数据帧(列 K 和 L)相同,并在第一个数据帧的 D 列上分配 1。
I can use for loop, but It will be very slow for large number of entries.
我可以使用 for 循环,但是对于大量条目,它会非常慢。
Any clue or suggestion will be appreciated.
任何线索或建议将不胜感激。
采纳答案by Mohammad Saifullah
This is how I solved it:
我是这样解决的:
df1 = pd.DataFrame({"A":['AA','AD','AD'], "B":['BA','BD','BF']})
df2 = pd.DataFrame({"A":['AA','AD'], 'B':['BA','BF']})
df1['compressed']=df1.apply(lambda x:'%s%s' % (x['A'],x['B']),axis=1)
df2['compressed']=df2.apply(lambda x:'%s%s' % (x['A'],x['B']),axis=1)
df1['Success'] = df1['compressed'].isin(df2['compressed']).astype(int)
print df1
A B compressed Success
0 AA BA AABA 1
1 AD BD ADBD 0
2 AD BF ADBF 1
回答by EdChum
This would be easier if you renamed the columns of df2
and then you can compare row-wise:
如果您重命名了 的列,df2
然后您可以逐行比较,这会更容易:
In [35]:
df2.columns = ['A', 'B']
df2
Out[35]:
A B
0 AA BA
1 AD BF
2 AF BF
In [38]:
df1['D'] = (df1[['A', 'B']] == df2).all(axis=1).astype(int)
df1
Out[38]:
A B C D
0 AA BA KK 1
1 AD BD LL 0
2 AF BF MM 1
回答by PiotrKu
DF1.merge(right=DF2,left_on=[DF1.A,DF1.B],right_on=[DF2.K,DF2.L], indicator=True, how='left')
DF1.merge(right=DF2,left_on=[DF1.A,DF1.B],right_on=[DF2.K,DF2.L], indicator=True, how='left')
gives:
给出:
A B C D K L _merge
0 AA BA KK 0 AA BA both
1 AD BD LL 0 NaN NaN left_only
2 AF BF MM 0 AF BF both
A B C D K L _merge
0 AA BA KK 0 AA BA both
1 AD BD LL 0 NaN NaN left_only
2 AF BF MM 0 AF BF both
So, as above, indicator does the job.
因此,如上所述,指标可以完成这项工作。
Peter
彼得
回答by Vipul Saxena
df1['ColumnName'].isin(df2['ColumnName']).value_counts()