Python 用另一个数据帧的值替换一个数据帧中的列值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36413993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:50:36  来源:igfitidea点击:

replace column values in one dataframe by values of another dataframe

pythonsearchpandasdataframe

提问by Amanda

I have two dataframes , the first one has 1000 rows and looks like:

我有两个数据框,第一个有 1000 行,看起来像:

Date            Group         Family       Bonus
2011-06-09      tri23_1       Laavin       456
2011-07-09      hsg?_T2       Grendy       679
2011-09-10      bbbj-1Y_jn    Fantol       431
2011-11-02      hsg?_T2       Gondow       569

The column Grouphas different values, sometimes repeated, but in general about 50 unique values.

该列Group具有不同的值,有时会重复,但通常大约有 50 个唯一值。

The second dataframe contains all these 50 unique values (50 rows) and also the hotels, that are associated to these values:

第二个数据框包含所有这 50 个唯一值(50 行)以及与这些值关联的酒店:

Group             Hotel
tri23_1           Jamel
hsg?_T2           Frank
bbbj-1Y_jn        Luxy
mlkl_781          Grand Hotel
vchs_94           Vancouver

My goal is to replace the value in the column Groupof the first dataframe by the the corresponding values of the column Hotelof the second dataframe/or create the column Hotelwith the corresponding values. When I try to make it just by assignment like

我的目标是Group将第一个数据帧列中的值替换为Hotel第二个数据帧列Hotel的相应值/或创建具有相应值的列。当我试图通过分配来实现它时

df1.loc[(df1.Group=df2.Group), 'Hotel']=df2.Hotel

I have an error that the dataframes are not of equal size, so the comparison is not possible

我有一个错误,即数据帧的大小不相等,因此无法进行比较

回答by EdChum

If you set the index to the 'Group' column on the other df then you can replace using mapon your original df 'Group' column:

如果您将索引设置为另一个 df 上的“Group”列,那么您可以map在原始 df 'Group' 列上替换 using :

In [36]:
df['Group'] = df['Group'].map(df1.set_index('Group')['Hotel'])
df

Out[36]:
         Date  Group  Family  Bonus
0  2011-06-09  Jamel  Laavin    456
1  2011-07-09  Frank  Grendy    679
2  2011-09-10   Luxy  Fantol    431
3  2011-11-02  Frank  Gondow    569

回答by Greg Friedman

You could also create a dictionary and use apply:

您还可以创建一个字典并使用 apply:

hotel_dict = df2.set_index('Group').to_dict()
df1['Group'] = df1['Group'].apply(lambda x: hotel_dict[x])

回答by 176coding

just use pandas join, you can refer to detail link: http://pandas.pydata.org/pandas-docs/stable/merging.html

只需使用pandas join,您可以参考详细链接:http: //pandas.pydata.org/pandas-docs/stable/merging.html

df1.join(df2,on='Group')

回答by vkt

This is an old question but here is another way to do it, it is not like the pandas way but is fast

这是一个老问题,但这是另一种方法,它不像熊猫的方式,但速度很快

Reproducing the dataframe 1 - this is to be updated

再现数据帧 1 - 这将被更新

df_1

    Date    Group   Family  Bonus
0   2011-06-09  tri23_1     Laavin  456
1   2011-07-09  hsg?_T2     Grendy  679
2   2011-09-10  bbbj-1Y_jn  Fantol  431
3   2011-11-02  hsg?_T2     Gondow  569

Reproducing dataframe 2 - the look up

再现数据帧 2 - 查找

df_2

    Group   Hotel
0   tri23_1     Jamel
1   hsg?_T2     Frank
2   bbbj-1Y_jn  Luxy
3   mlkl_781    Grand Hotel
4   vchs_94     Vancouver

Get all the hotel id (key column) from the dataframe 1 as a list

从数据框 1 中获取所有酒店 ID(关键列)作为列表

key_list = list(df_1['Group'])

['tri23_1', 'hsg?_T2', 'bbbj-1Y_jn', 'hsg?_T2']

Create a dictionary from the look up dataframe which has the key col and the value col

从具有键 col 和值 col 的查找数据帧创建字典

dict_lookup = dict(zip(df_2['Group'], df_2['Hotel']))

{'bbbj-1Y_jn': 'Luxy',
 'hsg?_T2': 'Frank',
 'mlkl_781': 'Grand Hotel',
 'tri23_1': 'Jamel',
 'vchs_94': 'Vancouver'}

Replace the value by creating a list by looking up the value and assign to dataframe 1 column

通过查找值创建列表来替换值并分配给数据框 1 列

df_1['Group'] = [dict_lookup[item] for item in key_list]

Updated dataframe 1

更新数据框 1

    Date    Group   Family  Bonus
0   2011-06-09  Jamel   Laavin  456
1   2011-07-09  Frank   Grendy  679
2   2011-09-10  Luxy    Fantol  431
3   2011-11-02  Frank   Gondow  569

回答by Erik Christiansen

Columns in pandas DataFrames are just Series. Make the DataFrames (or DataFrame and Series, as shown here) share the same index so that assignment can occur from the Series to the DataFrame:

pandas DataFrames 中的列只是系列。使 DataFrame(或 DataFrame 和 Series,如此处所示)共享相同的索引,以便可以从 Series 分配给 DataFrame:

**In:**

df = pd.DataFrame(data=
{'date': ['2011-06-09', '2011-07-09', '2011-09-10', '2011-11-02'], 
'family': ['Laavin', 'Grendy', 'Fantol', 'Gondow'], 
'bonus': ['456', '679', '431', '569']}, 
index=pd.Index(name='Group', data=['tri23_1', 'hsg?_T2', 'bbbj-1Y_jn', 'hsg?_T2']))

**Out:**
            date    family  bonus
Group           
tri23_1 2011-06-09  Laavin  456
hsg?_T2 2011-07-09  Grendy  679
bbbj-1Y_jn  2011-09-10  Fantol  431
hsg?_T2 2011-11-02  Gondow  569

**In:**

hotel_groups = pd.Series(['Jamel', 'Frank', 'Luxy', 'Grand Hotel', 'Vancouver'], 
index=pd.Index(name='Group', data=['tri23_1', 'hsg?_T2', 'bbbj-1Y_jn', 'mlkl_781', 'vchs_94']))

**Out:**

Group
tri23_1             Jamel
hsg?_T2             Frank
bbbj-1Y_jn           Luxy
mlkl_781      Grand Hotel
vchs_94         Vancouver
dtype: object

**In:**

df['hotel'] = hotel_groups

**Out:**

            date    family  bonus hotel
Group               
tri23_1 2011-06-09  Laavin  456 Jamel
hsg?_T2 2011-07-09  Grendy  679 Frank
bbbj-1Y_jn  2011-09-10  Fantol  431 Luxy
hsg?_T2 2011-11-02  Gondow  569 Frank

Notice that the index of both is 'Group', which allows the assignment.

请注意,两者的索引都是“组”,这允许分配。

If you assign a like-indexed Series to a DataFrame column, the assignment works. Notice that this works despite there being duplicate group values in df. It would not work if there were duplicate index values (with different corresponding data values) in the hotel_groups Series (e.g., if there were two entries for index value hsgc_T2, the first with data value Frank and the second with data value Luxy that is being assigned to df['hotel'] (not that this would ever occur in your example). This wouldn't work because there wouldn't be a way to know which value to assign the like-indexed DataFrame column.

如果您将类似索引的系列分配给 DataFrame 列,则分配有效。请注意,尽管 df 中存在重复的组值,但这仍然有效。如果 hotel_groups 系列中存在重复的索引值(具有不同的对应数据值),则它将不起作用(例如,如果索引值 hsgc_T2 有两个条目,第一个具有数据值 Frank,第二个具有数据值 Luxy分配给 df['hotel'] (在您的示例中不会发生这种情况)。这将不起作用,因为没有办法知道为类似索引的 DataFrame 列分配哪个值。