Pandas:根据来自另一列的匹配替换列值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38466682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:37:06  来源:igfitidea点击:

Pandas: replace column values based on match from another column

pythonpython-2.7pandasdataframe

提问by Anil_M

I've a column in first data-frame df1["ItemType"]as below,

我在第一个数据框中有一列,df1["ItemType"]如下所示,

Dataframe1

数据框1

ItemType1
redTomato
whitePotato
yellowPotato
greenCauliflower
yellowCauliflower
yelloSquash
redOnions
YellowOnions
WhiteOnions
yellowCabbage
GreenCabbage

I need to replace that based on a dictionary created from another data-frame.

我需要根据从另一个数据框创建的字典来替换它。

Dataframe2

数据框2

ItemType2          newType
whitePotato        Potato
yellowPotato       Potato
redTomato          Tomato
yellowCabbage   
GreenCabbage    
yellowCauliflower   yellowCauliflower
greenCauliflower    greenCauliflower
YellowOnions        Onions
WhiteOnions         Onions
yelloSquash         Squash
redOnions           Onions

Notice that,

请注意,

  • In dataframe2some of the ItemTypeare same as ItemTypein dataframe1.
  • Some ItemTypein dataframe2 have nullvalues like yellowCabbage.
  • ItemTypein dataframe2 are out of order with respect toItemTypein dataframe
  • dataframe2一些ItemType都一样ItemTypedataframe1
  • dataframe2 中的某些值ItemType具有null如 YellowCabbage 之类的值。
  • ItemType在 dataframe2 中的顺序相对于ItemTypeindataframe

I need to replace values in Dataframe1ItemTypecolumn if there is a match for value in the corresponding Dataframe2ItemTypewith newTypekeeping above exceptions listed in bullet-points in mind.
If there is no match, then values needs to be as they are [ no change].

Dataframe1ItemType如果相应的值匹配,我需要替换列中的值Dataframe2ItemTypenewType同时牢记要点中列出的上述例外情况。
如果没有匹配项,则值需要保持原样 [无变化]。

So far I got is.

到目前为止我得到的是。

import pandas as pd

#read second `csv-file`
df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"])
#conver to dict
df2=df2.set_index('ItemType').T.to_dict('list')

Below given replace on match are not working. They are inserting NaNvalues instead of actual. These are based on discussion hereon SO.

下面给出的匹配替换不起作用。他们正在插入NaN值而不是实际值。这些是基于此处关于 SO 的讨论。

df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]

OR

或者

df1['ItemType']=df2['ItemType'].map(df2)

Thanks in advance

提前致谢

EDIT
Two column headers in both data frames have different names. So dataframe1 column on is ItemType1 and first column in second data-frame is ItemType2. Missed that on first edit.

编辑
两个数据框中的两个列标题具有不同的名称。所以 dataframe1 列是 ItemType1,第二个数据帧中的第一列是 ItemType2。第一次编辑时错过了。

采纳答案by piRSquared

Use map

map

All the logic you need:

您需要的所有逻辑:

def update_type(t1, t2, dropna=False):
    return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)

Let's make 'ItemType2'the index of Dataframe2

让我们'ItemType2'建立索引Dataframe2

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
9         yellowCabbage
10         GreenCabbage
Name: ItemType1, dtype: object


update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType,
            dropna=True)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
Name: ItemType1, dtype: object

Verify

核实

updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType)

pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])

enter image description here

在此处输入图片说明



Timing

定时

def root(Dataframe1, Dataframe2):
    return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())

def piRSquared(Dataframe1, Dataframe2):
    t1 = Dataframe1.ItemType1
    t2 = Dataframe2.set_index('ItemType2').newType
    return update_type(t1, t2)

enter image description here

在此处输入图片说明

回答by root

You can convert df2into a Series indexed by 'ItemType2', and then use replaceon df1:

您可以转换df2为由 索引的系列'ItemType2',然后使用replaceon df1

# Make df2 a Series indexed by 'ItemType'.
df2 = df2.set_index('ItemType2')['newType'].dropna()

# Replace values in df1.
df1['ItemType1'] = df1['ItemType1'].replace(df2)

Or in a single line, if you don't want to alter df2:

或者在一行中,如果您不想更改df2

df1['ItemType1'] = df1['ItemType1'].replace(df2.set_index('ItemType2')['newType'].dropna())

回答by draco_alpine

This method requires you set your column names to 'type', then you can set off using merge and np.where

此方法要求您将列名设置为“type”,然后您可以使用 merge 和 np.where

df3 = df1.merge(df2,how='inner',on='type')['type','newType']

df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])