Python Pandas 从另一个数据帧更新数据帧值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49928463/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas update a dataframe value from another dataframe
提问by ProgSky
I have two dataframes in python. I want to update rows in first dataframe using matching values from another dataframe. Second dataframe serves as an override.
我在 python 中有两个数据框。我想使用来自另一个数据帧的匹配值更新第一个数据帧中的行。第二个数据帧用作覆盖。
Here is an example with same data and code:
这是一个具有相同数据和代码的示例:
DataFrame 1 :
数据帧 1 :
DataFrame 2:
数据帧 2:
I want to update update dataframe 1 based on matching code and name. In this example Dataframe 1 should be updated as below:
我想根据匹配的代码和名称更新更新数据帧 1。在这个例子中,Dataframe 1 应该更新如下:
Note : Row with Code =2 and Name= Company2 is updated with value 1000 (coming from Dataframe 2)
注意:代码为 2 且名称为 Company2 的行更新为值 1000(来自数据框 2)
import pandas as pd
data1 = {
'Code': [1, 2, 3],
'Name': ['Company1', 'Company2', 'Company3'],
'Value': [200, 300, 400],
}
df1 = pd.DataFrame(data1, columns= ['Code','Name','Value'])
data2 = {
'Code': [2],
'Name': ['Company2'],
'Value': [1000],
}
df2 = pd.DataFrame(data2, columns= ['Code','Name','Value'])
Any pointers or hints?
任何指针或提示?
采纳答案by YOBEN_S
You can using concat
+ drop_duplicates
您可以使用concat
+drop_duplicates
pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]:
Code Name Value
0 1 Company1 200
0 2 Company2 1000
2 3 Company3 400
回答by Nic
Using DataFrame.update, which aligns on indices (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html):
使用 DataFrame.update,它与索引对齐(https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html):
>>> df1.set_index('Code', inplace=True)
>>> df1.update(df2.set_index('Code'))
>>> df1.reset_index() # to recover the initial structure
Code Name Value
0 1 Company1 200.0
1 2 Company2 1000.0
2 3 Company3 400.0
回答by jpp
You can align indices and then use combine_first
:
您可以对齐索引,然后使用combine_first
:
res = df2.set_index(['Code', 'Name'])\
.combine_first(df1.set_index(['Code', 'Name']))\
.reset_index()
print(res)
# Code Name Value
# 0 1 Company1 200.0
# 1 2 Company2 1000.0
# 2 3 Company3 400.0
回答by Bubble Bubble Bubble Gut
You can merge the data first and then use numpy.where, here's how to use numpy.where
可以先合并数据,然后使用numpy.where,这里是如何使用numpy.where
updated = df1.merge(df2, how='left', on=['Code', 'Name'], suffixes=('', '_new'))
updated['Value'] = np.where(pd.notnull(updated['Value_new']), updated['Value_new'], updated['Value'])
updated.drop('Value_new', axis=1, inplace=True)
Code Name Value
0 1 Company1 200.0
1 2 Company2 1000.0
2 3 Company3 400.0
回答by Ami Tavory
You can use pd.Series.where
on the result of left-joining df1
and df2
您可以pd.Series.where
在左连接的结果上使用df1
和df2
merged = df1.merge(df2, on=['Code', 'Name'], how='left')
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value)
>>> df1
Code Name Value
0 1 Company1 200.0
1 2 Company2 1000.0
2 3 Company3 400.0
You can change the line to
您可以将行更改为
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value).astype(int)
in order to return the value to be an integer.
为了将值返回为整数。
回答by ALollz
Assuming company
and code
are redundant identifiers, you can also do
假设company
和code
是冗余标识符,您也可以这样做
import pandas as pd
vdic = pd.Series(df2.Value.values, index=df2.Name).to_dict()
df1.loc[df1.Name.isin(vdic.keys()), 'Value'] = df1.loc[df1.Name.isin(vdic.keys()), 'Name'].map(vdic)
# Code Name Value
#0 1 Company1 200
#1 2 Company2 1000
#2 3 Company3 400
回答by muTheTechie
- Append the dataset
- Drop the duplicate by
code
- Sort the values
- 附加数据集
- 删除重复项
code
- 对值进行排序
combined_df = combined_df.append(df2).drop_duplicates(['Code'],keep='last').sort_values('Code')
回答by arie64
None of the above solutions worked for my particular example, which I think is rooted in the dtype of my columns, but I eventually came to this solution
上述解决方案均不适用于我的特定示例,我认为这源于我的列的 dtype,但我最终还是找到了这个解决方案
indexes = df1.loc[df1.Code.isin(df2.Code.values)].index
df1.at[indexes,'Value'] = df2['Value'].values