在 Pandas 数据帧上使用 .replace() 方法时字典中的重叠键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42425971/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:03:18  来源:igfitidea点击:

Overlapping keys in dictionary when Using .replace() method on pandas dataframe

pythonpandas

提问by Nirvan

I want to replace some values in a column of a dataframe using a dictionary that maps the old codes to the new codes.

我想使用将旧代码映射到新代码的字典替换数据帧列中的某些值。

di = dict( { "myVar": {11:0, 204:11} } )
mydata.replace( to_replace = di, inplace = True )

But some of the new codes and old codes overlap. When using the .replace method of the dataframe I encounter the error 'Replacement not allowed with overlapping keys and values'

但是一些新代码和旧代码重叠。使用数据框的 .replace 方法时遇到错误'Replacement not allowed with overlapping keys and values'

My current workaround is to replace replace the offending keys manually and then apply the dictionary to the remaining non-overlapping cases.

我目前的解决方法是手动替换替换有问题的键,然后将字典应用于剩余的非重叠案例。

mydata.loc[ mydata.myVar == 11, "myVar" ] = 0 
di = dict( { "myVar": {204:11} } )
mydata.replace( to_replace = di, inplace = True )

Is there a more compact way to do this?

有没有更紧凑的方法来做到这一点?

回答by Nirvan

I found an answer herethat uses the .map method on a series in conjunction with a dictionary. Here's an example recoding dictionary with overlapping keys and values.

我在这里找到了一个答案,该答案将 .map 方法与字典结合使用。这是一个具有重叠键和值的重新编码字典示例。

import pandas as pd
>>> df = pd.DataFrame( [1,2,3,4,1], columns = ['Var'] )
>>> df
   Var
0    1
1    2
2    3
3    4
4    1
>>> dict = {1:2, 2:3, 3:1, 4:3}
>>> df.Var.map( dict )
0    2
1    3
2    1
3    3
4    2
Name: Var, dtype: int64

UPDATE:

更新:

With map, every value in the original series must be mapped to a new value. If the mapping dictionary does not contain all the values of the original column, the unmapped values are mapped to NaN.

使用 map,原始系列中的每个值都必须映射到一个新值。如果映射字典不包含原始列的所有值,则未映射的值将映射到 NaN。

>>> df = pd.DataFrame( [1,2,3,4,1], columns = ['Var'] )
>>> dict = {1:2, 2:3, 3:1}
>>> df.Var.map( dict )
0    2.0
1    3.0
2    1.0
3    NaN
4    2.0
Name: Var, dtype: float64