Pandas:根据来自另一列的匹配替换列值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38466682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: replace column values based on match from another column
提问by Anil_M
I've a column in first data-frame df1["ItemType"]
as below,
我在第一个数据框中有一列,df1["ItemType"]
如下所示,
Dataframe1
数据框1
ItemType1
redTomato
whitePotato
yellowPotato
greenCauliflower
yellowCauliflower
yelloSquash
redOnions
YellowOnions
WhiteOnions
yellowCabbage
GreenCabbage
I need to replace that based on a dictionary created from another data-frame.
我需要根据从另一个数据框创建的字典来替换它。
Dataframe2
数据框2
ItemType2 newType
whitePotato Potato
yellowPotato Potato
redTomato Tomato
yellowCabbage
GreenCabbage
yellowCauliflower yellowCauliflower
greenCauliflower greenCauliflower
YellowOnions Onions
WhiteOnions Onions
yelloSquash Squash
redOnions Onions
Notice that,
请注意,
- In
dataframe2
some of theItemType
are same asItemType
indataframe1
. - Some
ItemType
in dataframe2 havenull
values like yellowCabbage. ItemType
in dataframe2 are out of order with respect toItemType
indataframe
- 在
dataframe2
一些ItemType
都一样ItemType
在dataframe1
。 - dataframe2 中的某些值
ItemType
具有null
如 YellowCabbage 之类的值。 ItemType
在 dataframe2 中的顺序相对于ItemType
indataframe
I need to replace values in Dataframe1
ItemType
column if there is a match for value in the corresponding Dataframe2
ItemType
with newType
keeping above exceptions listed in bullet-points in mind.
If there is no match, then values needs to be as they are [ no change].
Dataframe1
ItemType
如果相应的值匹配,我需要替换列中的值Dataframe2
ItemType
,newType
同时牢记要点中列出的上述例外情况。
如果没有匹配项,则值需要保持原样 [无变化]。
So far I got is.
到目前为止我得到的是。
import pandas as pd
#read second `csv-file`
df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"])
#conver to dict
df2=df2.set_index('ItemType').T.to_dict('list')
Below given replace on match are not working. They are inserting NaN
values instead of actual. These are based on discussion hereon SO.
下面给出的匹配替换不起作用。他们正在插入NaN
值而不是实际值。这些是基于此处关于 SO 的讨论。
df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]
OR
或者
df1['ItemType']=df2['ItemType'].map(df2)
Thanks in advance
提前致谢
EDIT
Two column headers in both data frames have different names. So dataframe1 column on is ItemType1 and first column in second data-frame is ItemType2. Missed that on first edit.
编辑
两个数据框中的两个列标题具有不同的名称。所以 dataframe1 列是 ItemType1,第二个数据帧中的第一列是 ItemType2。第一次编辑时错过了。
采纳答案by piRSquared
Use map
用 map
All the logic you need:
您需要的所有逻辑:
def update_type(t1, t2, dropna=False):
return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)
Let's make 'ItemType2'
the index of Dataframe2
让我们'ItemType2'
建立索引Dataframe2
update_type(Dataframe1.ItemType1,
Dataframe2.set_index('ItemType2').newType)
0 Tomato
1 Potato
2 Potato
3 greenCauliflower
4 yellowCauliflower
5 Squash
6 Onions
7 Onions
8 Onions
9 yellowCabbage
10 GreenCabbage
Name: ItemType1, dtype: object
update_type(Dataframe1.ItemType1,
Dataframe2.set_index('ItemType2').newType,
dropna=True)
0 Tomato
1 Potato
2 Potato
3 greenCauliflower
4 yellowCauliflower
5 Squash
6 Onions
7 Onions
8 Onions
Name: ItemType1, dtype: object
Verify
核实
updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType)
pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])
Timing
定时
def root(Dataframe1, Dataframe2):
return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna())
def piRSquared(Dataframe1, Dataframe2):
t1 = Dataframe1.ItemType1
t2 = Dataframe2.set_index('ItemType2').newType
return update_type(t1, t2)
回答by root
You can convert df2
into a Series indexed by 'ItemType2'
, and then use replace
on df1
:
您可以转换df2
为由 索引的系列'ItemType2'
,然后使用replace
on df1
:
# Make df2 a Series indexed by 'ItemType'.
df2 = df2.set_index('ItemType2')['newType'].dropna()
# Replace values in df1.
df1['ItemType1'] = df1['ItemType1'].replace(df2)
Or in a single line, if you don't want to alter df2
:
或者在一行中,如果您不想更改df2
:
df1['ItemType1'] = df1['ItemType1'].replace(df2.set_index('ItemType2')['newType'].dropna())
回答by draco_alpine
This method requires you set your column names to 'type', then you can set off using merge and np.where
此方法要求您将列名设置为“type”,然后您可以使用 merge 和 np.where
df3 = df1.merge(df2,how='inner',on='type')['type','newType']
df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])