从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方法

Question

提问by iboboboru

I have a data frame that I want to replace the values in one column, with values from another dataframe.

我有一个数据框，我想用另一个数据框的值替换一列中的值。

df = pd.DataFrame({'id1': [1001,1002,1001,1003,1004,1005,1002,1006],
                   'value1': ["a","b","c","d","e","f","g","h"],
                   'value3': ["yes","no","yes","no","no","no","yes","no"]})

dfReplace = pd.DataFrame({'id2': [1001,1002],
                   'value2': ["rep1","rep2"]})

I need to use a groupby with common key and current solution is with a loop. Is there a more elegant (faster) way to do this with .map(apply) etc. I wanted initial to use pd.update(), but doesn't seem the correct way.

我需要使用带有公用键的 groupby，当前的解决方案是使用循环。有没有更优雅（更快）的方法来使用 .map(apply) 等。我想最初使用 pd.update()，但似乎不是正确的方法。

groups = dfReplace.groupby(['id2'])

for key, group in groups:
    df.loc[df['id1']==key,'value1']=group['value2'].values

Output

输出

df
    id1   value1 value3
0   1001  rep1   yes
1   1002  rep2   no
2   1001  rep1   yes
3   1003  d      no
4   1004  e      no
5   1005  f      no
6   1002  rep2   yes
7   1006  h      no

Answer 1

回答by MaxU

try merge():

尝试合并（）：

merge = df.merge(dfReplace, left_on='id1', right_on='id2', how='left')
print(merge)

merge.ix[(merge.id1 == merge.id2), 'value1'] = merge.value2
print(merge)

del merge['id2']
del merge['value2']
print(merge)

Output:

输出：

    id1 value1 value3   id2 value2
0  1001      a    yes  1001   rep1
1  1002      b     no  1002   rep2
2  1001      c    yes  1001   rep1
3  1003      d     no   NaN    NaN
4  1004      e     no   NaN    NaN
5  1005      f     no   NaN    NaN
6  1002      g    yes  1002   rep2
7  1006      h     no   NaN    NaN

    id1 value1 value3   id2 value2
0  1001   rep1    yes  1001   rep1
1  1002   rep2     no  1002   rep2
2  1001   rep1    yes  1001   rep1
3  1003      d     no   NaN    NaN
4  1004      e     no   NaN    NaN
5  1005      f     no   NaN    NaN
6  1002   rep2    yes  1002   rep2
7  1006      h     no   NaN    NaN

    id1 value1 value3
0  1001   rep1    yes
1  1002   rep2     no
2  1001   rep1    yes
3  1003      d     no
4  1004      e     no
5  1005      f     no
6  1002   rep2    yes
7  1006      h     no

Answer 2

回答by JohnE

This is a little cleaner if you already have the indexes set to id, but if not you can still do in one line:

如果您已经将索引设置为 id，这会更简洁一些，但如果没有，您仍然可以在一行中完成：

>>> (dfReplace.set_index('id2').rename( columns = {'value2':'value1'} )
                               .combine_first(df.set_index('id1')))

     value1 value3
1001   rep1    yes
1001   rep1    yes
1002   rep2     no
1002   rep2    yes
1003      d     no
1004      e     no
1005      f     no
1006      h     no

If you separate into three lines and do the renaming and re-indexing separately, you can see that the combine_first()by itself is actually very simple:

如果你分成三行，分别进行重命名和重新索引，你可以看到它combine_first()本身其实很简单：

>>> df = df.set_index('id1')
>>> dfReplace = dfReplace.set_index('id2').rename( columns={'value2':'value1'} )

>>> dfReplace.combine_first(df)

从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方法

提问by iboboboru

回答by MaxU

回答by JohnE

相关推荐

最近更新

标签

从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方法

提问by iboboboru

回答by MaxU

回答by JohnE

相关推荐

pandas 在熊猫数据框中以相同字符串开头的列的总和值

pandas 获取 DataFrame 列作为值列表

在 Pandas 中，read_excel() 中使用的 read_csv() 中的“nrows”相当于什么？

pandas 连接具有不同列顺序的数据框

相关推荐

最近更新

标签