Python Pandas - 替换列值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31888871/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:42:08  来源:igfitidea点击:

Pandas - replacing column values

pythonpandas

提问by Simon

I know there are a number of topics on this question, but none of the methods worked for me so I'm posting about my specific situation

我知道关于这个问题有很多主题,但没有一种方法对我有用,所以我发布了关于我的具体情况

I have a dataframe that looks like this:

我有一个看起来像这样的数据框:

data = pd.DataFrame([[1,0],[0,1],[1,0],[0,1]], columns=["sex", "split"])
data['sex'].replace(0, 'Female')
data['sex'].replace(1, 'Male')
data

What I want to do is replace all 0's in the sex column with 'Female', and all 1's with 'Male', but the values within the dataframe don't seem to change when I use the code above

我想要做的是将性别列中的所有 0 替换为“女性”,并将所有 1 替换为“男性”,但是当我使用上面的代码时,数据框中的值似乎没有改变

Am I using replace() incorrectly? Or is there a better way to do conditional replacement of values?

我是否错误地使用了 replace()?或者有没有更好的方法来有条件地替换值?

采纳答案by Anand S Kumar

Yes, you are using it incorrectly, Series.replace()is not inplace operation by default, it returns the replaced dataframe/series, you need to assign it back to your dataFrame/Series for its effect to occur. Or if you need to do it inplace, you need to specify the inplacekeyword argument as TrueExample -

是的,您使用不正确,Series.replace()默认情况下不是就地操作,它返回替换的数据帧/系列,您需要将其分配回您的数据帧/系列才能产生效果。或者,如果您需要就地进行,则需要将inplace关键字参数指定为TrueExample -

data['sex'].replace(0, 'Female',inplace=True)
data['sex'].replace(1, 'Male',inplace=True)

Also, you can combine the above into a single replacefunction call by using listfor both to_replaceargument as well as valueargument , Example -

此外,您可以replace通过使用listforto_replace参数和value参数将上述内容组合成一个函数调用,例如 -

data['sex'].replace([0,1],['Female','Male'],inplace=True)

Example/Demo -

示例/演示 -

In [10]: data = pd.DataFrame([[1,0],[0,1],[1,0],[0,1]], columns=["sex", "split"])

In [11]: data['sex'].replace([0,1],['Female','Male'],inplace=True)

In [12]: data
Out[12]:
      sex  split
0    Male      0
1  Female      1
2    Male      0
3  Female      1


You can also use a dictionary, Example -

您也可以使用字典,例如 -

In [15]: data = pd.DataFrame([[1,0],[0,1],[1,0],[0,1]], columns=["sex", "split"])

In [16]: data['sex'].replace({0:'Female',1:'Male'},inplace=True)

In [17]: data
Out[17]:
      sex  split
0    Male      0
1  Female      1
2    Male      0
3  Female      1

回答by student

You can also try using applywith getmethod of dictionary, seems to be little faster than replace:

您也可以尝试使用applywithget方法dictionary,似乎比 快一点replace

data['sex'] = data['sex'].apply({1:'Male', 0:'Female'}.get)

Testing with timeit:

测试timeit

%%timeit
data['sex'].replace([0,1],['Female','Male'],inplace=True)

Result:

结果:

The slowest run took 5.83 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 510 μs per loop

Using apply:

使用apply

%%timeit
data['sex'] = data['sex'].apply({1:'Male', 0:'Female'}.get)

Result:

结果:

The slowest run took 5.92 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 331 μs per loop

Note:applywith dictionary should be used if all the possible values of the columns in the dataframe are defined in the dictionary else, it will have empty for those not defined in dictionary.

注意:apply如果数据框中列的所有可能值都在字典中定义,则应使用字典,否则字典中未定义的值将为空。