Python Pandas 从另一个数据帧更新数据帧值

Question

提问by ProgSky

I have two dataframes in python. I want to update rows in first dataframe using matching values from another dataframe. Second dataframe serves as an override.

我在 python 中有两个数据框。我想使用来自另一个数据帧的匹配值更新第一个数据帧中的行。第二个数据帧用作覆盖。

Here is an example with same data and code:

这是一个具有相同数据和代码的示例：

DataFrame 1 :

数据帧 1 ：

DataFrame 2:

数据帧 2：

I want to update update dataframe 1 based on matching code and name. In this example Dataframe 1 should be updated as below:

我想根据匹配的代码和名称更新更新数据帧 1。在这个例子中，Dataframe 1 应该更新如下：

Note : Row with Code =2 and Name= Company2 is updated with value 1000 (coming from Dataframe 2)

注意：代码为 2 且名称为 Company2 的行更新为值 1000（来自数据框 2）

import pandas as pd

data1 = {
         'Code': [1, 2, 3],
         'Name': ['Company1', 'Company2', 'Company3'],
         'Value': [200, 300, 400],

    }
df1 = pd.DataFrame(data1, columns= ['Code','Name','Value'])

data2 = {
         'Code': [2],
         'Name': ['Company2'],
         'Value': [1000],
    }

df2 = pd.DataFrame(data2, columns= ['Code','Name','Value'])

Any pointers or hints?

任何指针或提示？

Answer 1

采纳答案by YOBEN_S

You can using concat+ drop_duplicates

您可以使用concat+drop_duplicates

pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]: 
   Code      Name  Value
0     1  Company1    200
0     2  Company2   1000
2     3  Company3    400

Answer 2

回答by Nic

Using DataFrame.update, which aligns on indices (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html):

使用 DataFrame.update，它与索引对齐（https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html）：

>>> df1.set_index('Code', inplace=True)
>>> df1.update(df2.set_index('Code'))
>>> df1.reset_index()  # to recover the initial structure

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

Answer 3

回答by jpp

You can align indices and then use combine_first:

您可以对齐索引，然后使用combine_first：

res = df2.set_index(['Code', 'Name'])\
         .combine_first(df1.set_index(['Code', 'Name']))\
         .reset_index()

print(res)

#    Code      Name   Value
# 0     1  Company1   200.0
# 1     2  Company2  1000.0
# 2     3  Company3   400.0

Answer 4

回答by Bubble Bubble Bubble Gut

You can merge the data first and then use numpy.where, here's how to use numpy.where

可以先合并数据，然后使用numpy.where，这里是如何使用numpy.where

updated = df1.merge(df2, how='left', on=['Code', 'Name'], suffixes=('', '_new'))
updated['Value'] = np.where(pd.notnull(updated['Value_new']), updated['Value_new'], updated['Value'])
updated.drop('Value_new', axis=1, inplace=True)

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

Answer 5

回答by Ami Tavory

You can use pd.Series.whereon the result of left-joining df1and df2

您可以pd.Series.where在左连接的结果上使用df1和df2

merged = df1.merge(df2, on=['Code', 'Name'], how='left')
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value)
>>> df1
    Code    Name    Value
0   1   Company1    200.0
1   2   Company2    1000.0
2   3   Company3    400.0

You can change the line to

您可以将行更改为

df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value).astype(int)

in order to return the value to be an integer.

为了将值返回为整数。

Answer 6

回答by ALollz

Assuming companyand codeare redundant identifiers, you can also do

假设company和code是冗余标识符，您也可以这样做

import pandas as pd
vdic = pd.Series(df2.Value.values, index=df2.Name).to_dict()

df1.loc[df1.Name.isin(vdic.keys()), 'Value'] = df1.loc[df1.Name.isin(vdic.keys()), 'Name'].map(vdic)

#   Code      Name  Value
#0     1  Company1    200
#1     2  Company2   1000
#2     3  Company3    400

Answer 7

回答by muTheTechie

Append the dataset
Drop the duplicate by code
Sort the values

附加数据集
删除重复项 code
对值进行排序

combined_df = combined_df.append(df2).drop_duplicates(['Code'],keep='last').sort_values('Code')

Answer 8

回答by arie64

None of the above solutions worked for my particular example, which I think is rooted in the dtype of my columns, but I eventually came to this solution

上述解决方案均不适用于我的特定示例，我认为这源于我的列的 dtype，但我最终还是找到了这个解决方案

indexes = df1.loc[df1.Code.isin(df2.Code.values)].index
df1.at[indexes,'Value'] = df2['Value'].values

Python Pandas 从另一个数据帧更新数据帧值

提问by ProgSky

采纳答案by YOBEN_S

回答by Nic

回答by jpp

回答by Bubble Bubble Bubble Gut

回答by Ami Tavory

回答by ALollz

回答by muTheTechie

回答by arie64

相关推荐

最近更新

标签

Python Pandas 从另一个数据帧更新数据帧值

提问by ProgSky

采纳答案by YOBEN_S

回答by Nic

回答by jpp

回答by Bubble Bubble Bubble Gut

回答by Ami Tavory

回答by ALollz

回答by muTheTechie

回答by arie64

相关推荐

用于 Python 变量的 Docker ENV

如果 OS Python 版本是 3.5，如何设置 pipenv Python 3.6 项目？

如何在 Python 中将彩色输出打印到终端？

Python 可迭代原始文本文档，收到字符串对象

相关推荐

最近更新

标签