如何比较 Pandas 中两个数据框的值？

Question

提问by MEhsan

I have two dataframes dfand df2like this

我有两个数据框df，df2就像这样

    id  initials
0   100 J
1   200 S
2   300 Y

    name  initials
0   John   J
1   Smith  S
2   Nathan N

I want to compare the values in the initialscolumns found in (dfand df2) and copy the name (in df2) which its initial is matching to the initial in the first dataframe (df)

我想比较initials(df和df2) 中找到的列中的值，并复制df2其首字母与第一个数据帧 ( df) 中首字母匹配的名称 (in )

import pandas as pd

for i in df.initials:
    for j in df2.initials:
        if i == j:
        # copy the name value of this particular initial to df

The output should be like this:

输出应该是这样的：

     id name
 0   100 Johon
 1   200 Smith
 2   300

Any idea how to solve this problem?

知道如何解决这个问题吗？

Answer 1

回答by mechanical_meat

How about?:

怎么样？：

df3 = df.merge(df2,on='initials',
                   how='outer').drop(['initials'],axis=1).dropna(subset=['id'])
>>> df3
      id    name
0  100.0    John
1  200.0   Smith
2  300.0     NaN

So the 'initials' column is dropped and so is anything with np.nanin the 'id' column.

所以'initials'列被删除，np.nan'id'列中的任何内容也被删除。

If you don't want the np.nanin there tack on a .fillna():

如果你不想np.nan在那里大头钉.fillna()：

df3 = df.merge(df2,on='initials',
                   how='outer').drop(['initials'],axis=1).dropna(subset=['id']).fillna('')
>>> df3
      id   name
0  100.0   John
1  200.0  Smith
2  300.0

Answer 2

回答by Vidhya G

df1 
    id initials                                                                                                                                                               
0  100        J                                                                                                                                                               
1  200        S                                                                                                                                                               
2  300        Y 

df2
     name initials                                                                                                                                                            
0    John        J                                                                                                                                                            
1   Smith        S                                                                                                                                                            
2  Nathan        N

Use Boolean masks: df2.initials==df1.initialswill tell you which values in the two initialscolumns are the same.

使用布尔掩码：df2.initials==df1.initials将告诉您两initials列中哪些值相同。

0     True                                                                                                                                                                    
1     True                                                                                                                                                                    
2    False

Use this mask to create a new column:

使用此掩码创建新列：

df1['name'] = df2.name[df2.initials==df1.initials]

Remove the initialscolumn in df1:

删除中的initials列df1：

df1.drop('initials', axis=1)

Replace the NaNusing fillna(' ')

替换NaN使用fillna(' ')

df1.fillna('', inplace=True) #inplace to avoid creating a copy

    id   name                                                                                                                                                                 
0  100   John                                                                                                                                                                 
1  200  Smith                                                                                                                                                                 
2  300

如何比较 Pandas 中两个数据框的值？

提问by MEhsan

回答by mechanical_meat

回答by Vidhya G

相关推荐

最近更新

标签

如何比较 Pandas 中两个数据框的值？

提问by MEhsan

回答by mechanical_meat

回答by Vidhya G

相关推荐

pandas 如何在 sklearn 中使用 OneHotEncoder 的输出？

pandas 谁能解释这个错误 [AttributeError: 'DataFrame' object has no attribute 'to_numeric']

pandas ValueError：无法将 DatetimeIndex 转换为 dtype datetime64[us]

Python Pandas，创建指定列数据类型的空 DataFrame

相关推荐

最近更新

标签