Pandas：根据来自另一列的匹配替换列值

Question

提问by Anil_M

I've a column in first data-frame df1["ItemType"]as below,

我在第一个数据框中有一列，df1["ItemType"]如下所示，

Dataframe1

数据框1

ItemType1
redTomato
whitePotato
yellowPotato
greenCauliflower
yellowCauliflower
yelloSquash
redOnions
YellowOnions
WhiteOnions
yellowCabbage
GreenCabbage

I need to replace that based on a dictionary created from another data-frame.

我需要根据从另一个数据框创建的字典来替换它。

Dataframe2

数据框2

ItemType2          newType
whitePotato        Potato
yellowPotato       Potato
redTomato          Tomato
yellowCabbage   
GreenCabbage    
yellowCauliflower   yellowCauliflower
greenCauliflower    greenCauliflower
YellowOnions        Onions
WhiteOnions         Onions
yelloSquash         Squash
redOnions           Onions

Notice that,

请注意，

In dataframe2some of the ItemTypeare same as ItemTypein dataframe1.
Some ItemTypein dataframe2 have nullvalues like yellowCabbage.
ItemTypein dataframe2 are out of order with respect toItemTypein dataframe

在dataframe2一些ItemType都一样ItemType在 dataframe1。
dataframe2 中的某些值ItemType具有null如 YellowCabbage 之类的值。
ItemType在 dataframe2 中的顺序相对于ItemTypeindataframe

I need to replace values in Dataframe1ItemTypecolumn if there is a match for value in the corresponding Dataframe2ItemTypewith newTypekeeping above exceptions listed in bullet-points in mind.
If there is no match, then values needs to be as they are [ no change].

Dataframe1ItemType如果相应的值匹配，我需要替换列中的值Dataframe2ItemType，newType同时牢记要点中列出的上述例外情况。
如果没有匹配项，则值需要保持原样 [无变化]。

So far I got is.

到目前为止我得到的是。

import pandas as pd

#read second `csv-file`
df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"])
#conver to dict
df2=df2.set_index('ItemType').T.to_dict('list')

Below given replace on match are not working. They are inserting NaNvalues instead of actual. These are based on discussion hereon SO.

下面给出的匹配替换不起作用。他们正在插入NaN值而不是实际值。这些是基于此处关于 SO 的讨论。

df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]

OR

或者

df1['ItemType']=df2['ItemType'].map(df2)

Thanks in advance

提前致谢

EDIT
Two column headers in both data frames have different names. So dataframe1 column on is ItemType1 and first column in second data-frame is ItemType2. Missed that on first edit.

编辑
两个数据框中的两个列标题具有不同的名称。所以 dataframe1 列是 ItemType1，第二个数据帧中的第一列是 ItemType2。第一次编辑时错过了。

Answer 1

采纳答案by piRSquared

Use map

用 map

All the logic you need:

您需要的所有逻辑：

def update_type(t1, t2, dropna=False):
    return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)

Let's make 'ItemType2'the index of Dataframe2

让我们'ItemType2'建立索引Dataframe2

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
9         yellowCabbage
10         GreenCabbage
Name: ItemType1, dtype: object

update_type(Dataframe1.ItemType1,
            Dataframe2.set_index('ItemType2').newType,
            dropna=True)

0                Tomato
1                Potato
2                Potato
3      greenCauliflower
4     yellowCauliflower
5                Squash
6                Onions
7                Onions
8                Onions
Name: ItemType1, dtype: object

Verify

核实

updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType)

pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])

Timing

定时

def root(Dataframe1, Dataframe2):
    return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())

def piRSquared(Dataframe1, Dataframe2):
    t1 = Dataframe1.ItemType1
    t2 = Dataframe2.set_index('ItemType2').newType
    return update_type(t1, t2)

Answer 2

回答by root

You can convert df2into a Series indexed by 'ItemType2', and then use replaceon df1:

您可以转换df2为由索引的系列'ItemType2'，然后使用replaceon df1：

# Make df2 a Series indexed by 'ItemType'.
df2 = df2.set_index('ItemType2')['newType'].dropna()

# Replace values in df1.
df1['ItemType1'] = df1['ItemType1'].replace(df2)

Or in a single line, if you don't want to alter df2:

或者在一行中，如果您不想更改df2：

df1['ItemType1'] = df1['ItemType1'].replace(df2.set_index('ItemType2')['newType'].dropna())

Answer 3

回答by draco_alpine

This method requires you set your column names to 'type', then you can set off using merge and np.where

此方法要求您将列名设置为“type”，然后您可以使用 merge 和 np.where

df3 = df1.merge(df2,how='inner',on='type')['type','newType']

df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])

Pandas：根据来自另一列的匹配替换列值

提问by Anil_M

采纳答案by piRSquared

Verify

核实

Timing

定时

回答by root

回答by draco_alpine

相关推荐

最近更新

标签

Pandas：根据来自另一列的匹配替换列值

提问by Anil_M

采纳答案by piRSquared

Verify

核实

Timing

定时

回答by root

回答by draco_alpine

相关推荐

在 Pandas 中反转“one-hot”编码

pandas 使用pandas在python中读取csv文件时出错

pandas 排序和限制条形图上显示的条形数量

Pandas：将 timedelta 列添加到 datetime 列（矢量化）

相关推荐

最近更新

标签