Python Pandas 从另一个数据帧更新数据帧值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49928463/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:18:44  来源:igfitidea点击:

Python Pandas update a dataframe value from another dataframe

pythonpandasdataframe

提问by ProgSky

I have two dataframes in python. I want to update rows in first dataframe using matching values from another dataframe. Second dataframe serves as an override.

我在 python 中有两个数据框。我想使用来自另一个数据帧的匹配值更新第一个数据帧中的行。第二个数据帧用作覆盖。

Here is an example with same data and code:

这是一个具有相同数据和代码的示例:

DataFrame 1 :

数据帧 1 :

enter image description here

在此处输入图片说明

DataFrame 2:

数据帧 2:

enter image description here

在此处输入图片说明

I want to update update dataframe 1 based on matching code and name. In this example Dataframe 1 should be updated as below:

我想根据匹配的代码和名称更新更新数据帧 1。在这个例子中,Dataframe 1 应该更新如下:

enter image description here

在此处输入图片说明

Note : Row with Code =2 and Name= Company2 is updated with value 1000 (coming from Dataframe 2)

注意:代码为 2 且名称为 Company2 的行更新为值 1000(来自数据框 2)

import pandas as pd

data1 = {
         'Code': [1, 2, 3],
         'Name': ['Company1', 'Company2', 'Company3'],
         'Value': [200, 300, 400],

    }
df1 = pd.DataFrame(data1, columns= ['Code','Name','Value'])

data2 = {
         'Code': [2],
         'Name': ['Company2'],
         'Value': [1000],
    }

df2 = pd.DataFrame(data2, columns= ['Code','Name','Value'])

Any pointers or hints?

任何指针或提示?

采纳答案by YOBEN_S

You can using concat+ drop_duplicates

您可以使用concat+drop_duplicates

pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]: 
   Code      Name  Value
0     1  Company1    200
0     2  Company2   1000
2     3  Company3    400

回答by Nic

Using DataFrame.update, which aligns on indices (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html):

使用 DataFrame.update,它与索引对齐(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html):

>>> df1.set_index('Code', inplace=True)
>>> df1.update(df2.set_index('Code'))
>>> df1.reset_index()  # to recover the initial structure

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

回答by jpp

You can align indices and then use combine_first:

您可以对齐索引,然后使用combine_first

res = df2.set_index(['Code', 'Name'])\
         .combine_first(df1.set_index(['Code', 'Name']))\
         .reset_index()

print(res)

#    Code      Name   Value
# 0     1  Company1   200.0
# 1     2  Company2  1000.0
# 2     3  Company3   400.0

回答by Bubble Bubble Bubble Gut

You can merge the data first and then use numpy.where, here's how to use numpy.where

可以先合并数据,然后使用numpy.where,这里是如何使用numpy.where

updated = df1.merge(df2, how='left', on=['Code', 'Name'], suffixes=('', '_new'))
updated['Value'] = np.where(pd.notnull(updated['Value_new']), updated['Value_new'], updated['Value'])
updated.drop('Value_new', axis=1, inplace=True)

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

回答by Ami Tavory

You can use pd.Series.whereon the result of left-joining df1and df2

您可以pd.Series.where在左连接的结果上使用df1df2

merged = df1.merge(df2, on=['Code', 'Name'], how='left')
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value)
>>> df1
    Code    Name    Value
0   1   Company1    200.0
1   2   Company2    1000.0
2   3   Company3    400.0

You can change the line to

您可以将行更改为

df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value).astype(int)

in order to return the value to be an integer.

为了将值返回为整数。

回答by ALollz

Assuming companyand codeare redundant identifiers, you can also do

假设companycode是冗余标识符,您也可以这样做

import pandas as pd
vdic = pd.Series(df2.Value.values, index=df2.Name).to_dict()

df1.loc[df1.Name.isin(vdic.keys()), 'Value'] = df1.loc[df1.Name.isin(vdic.keys()), 'Name'].map(vdic)

#   Code      Name  Value
#0     1  Company1    200
#1     2  Company2   1000
#2     3  Company3    400

回答by muTheTechie

  1. Append the dataset
  2. Drop the duplicate by code
  3. Sort the values
  1. 附加数据集
  2. 删除重复项 code
  3. 对值进行排序
combined_df = combined_df.append(df2).drop_duplicates(['Code'],keep='last').sort_values('Code')

回答by arie64

None of the above solutions worked for my particular example, which I think is rooted in the dtype of my columns, but I eventually came to this solution

上述解决方案均不适用于我的特定示例,我认为这源于我的列的 dtype,但我最终还是找到了这个解决方案

indexes = df1.loc[df1.Code.isin(df2.Code.values)].index
df1.at[indexes,'Value'] = df2['Value'].values