如何比较 Pandas 中两个数据框的值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38650273/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to Compare Values of two Dataframes in Pandas?
提问by MEhsan
I have two dataframes df
and df2
like this
我有两个数据框df
,df2
就像这样
id initials
0 100 J
1 200 S
2 300 Y
name initials
0 John J
1 Smith S
2 Nathan N
I want to compare the values in the initials
columns found in (df
and df2
) and copy the name (in df2
) which its initial is matching to the initial in the first dataframe (df
)
我想比较initials
(df
和df2
) 中找到的列中的值,并复制df2
其首字母与第一个数据帧 ( df
) 中首字母匹配的名称 (in )
import pandas as pd
for i in df.initials:
for j in df2.initials:
if i == j:
# copy the name value of this particular initial to df
The output should be like this:
输出应该是这样的:
id name
0 100 Johon
1 200 Smith
2 300
Any idea how to solve this problem?
知道如何解决这个问题吗?
回答by mechanical_meat
How about?:
怎么样?:
df3 = df.merge(df2,on='initials',
how='outer').drop(['initials'],axis=1).dropna(subset=['id'])
>>> df3
id name
0 100.0 John
1 200.0 Smith
2 300.0 NaN
So the 'initials' column is dropped and so is anything with np.nan
in the 'id' column.
所以'initials'列被删除,np.nan
'id'列中的任何内容也被删除。
If you don't want the np.nan
in there tack on a .fillna()
:
如果你不想np.nan
在那里大头钉.fillna()
:
df3 = df.merge(df2,on='initials',
how='outer').drop(['initials'],axis=1).dropna(subset=['id']).fillna('')
>>> df3
id name
0 100.0 John
1 200.0 Smith
2 300.0
回答by Vidhya G
df1
id initials
0 100 J
1 200 S
2 300 Y
df2
name initials
0 John J
1 Smith S
2 Nathan N
Use Boolean masks: df2.initials==df1.initials
will tell you which values in the two initials
columns are the same.
使用布尔掩码:df2.initials==df1.initials
将告诉您两initials
列中哪些值相同。
0 True
1 True
2 False
Use this mask to create a new column:
使用此掩码创建新列:
df1['name'] = df2.name[df2.initials==df1.initials]
Remove the initials
column in df1
:
删除中的initials
列df1
:
df1.drop('initials', axis=1)
Replace the NaN
using fillna(' ')
替换NaN
使用fillna(' ')
df1.fillna('', inplace=True) #inplace to avoid creating a copy
id name
0 100 John
1 200 Smith
2 300