Python Pandas - 替换列值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31888871/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - replacing column values
提问by Simon
I know there are a number of topics on this question, but none of the methods worked for me so I'm posting about my specific situation
我知道关于这个问题有很多主题,但没有一种方法对我有用,所以我发布了关于我的具体情况
I have a dataframe that looks like this:
我有一个看起来像这样的数据框:
data = pd.DataFrame([[1,0],[0,1],[1,0],[0,1]], columns=["sex", "split"])
data['sex'].replace(0, 'Female')
data['sex'].replace(1, 'Male')
data
What I want to do is replace all 0's in the sex column with 'Female', and all 1's with 'Male', but the values within the dataframe don't seem to change when I use the code above
我想要做的是将性别列中的所有 0 替换为“女性”,并将所有 1 替换为“男性”,但是当我使用上面的代码时,数据框中的值似乎没有改变
Am I using replace() incorrectly? Or is there a better way to do conditional replacement of values?
我是否错误地使用了 replace()?或者有没有更好的方法来有条件地替换值?
采纳答案by Anand S Kumar
Yes, you are using it incorrectly, Series.replace()
is not inplace operation by default, it returns the replaced dataframe/series, you need to assign it back to your dataFrame/Series for its effect to occur. Or if you need to do it inplace, you need to specify the inplace
keyword argument as True
Example -
是的,您使用不正确,Series.replace()
默认情况下不是就地操作,它返回替换的数据帧/系列,您需要将其分配回您的数据帧/系列才能产生效果。或者,如果您需要就地进行,则需要将inplace
关键字参数指定为True
Example -
data['sex'].replace(0, 'Female',inplace=True)
data['sex'].replace(1, 'Male',inplace=True)
Also, you can combine the above into a single replace
function call by using list
for both to_replace
argument as well as value
argument , Example -
此外,您可以replace
通过使用list
forto_replace
参数和value
参数将上述内容组合成一个函数调用,例如 -
data['sex'].replace([0,1],['Female','Male'],inplace=True)
Example/Demo -
示例/演示 -
In [10]: data = pd.DataFrame([[1,0],[0,1],[1,0],[0,1]], columns=["sex", "split"])
In [11]: data['sex'].replace([0,1],['Female','Male'],inplace=True)
In [12]: data
Out[12]:
sex split
0 Male 0
1 Female 1
2 Male 0
3 Female 1
You can also use a dictionary, Example -
您也可以使用字典,例如 -
In [15]: data = pd.DataFrame([[1,0],[0,1],[1,0],[0,1]], columns=["sex", "split"])
In [16]: data['sex'].replace({0:'Female',1:'Male'},inplace=True)
In [17]: data
Out[17]:
sex split
0 Male 0
1 Female 1
2 Male 0
3 Female 1
回答by student
You can also try using apply
with get
method of dictionary
, seems to be little faster than replace
:
您也可以尝试使用apply
withget
方法dictionary
,似乎比 快一点replace
:
data['sex'] = data['sex'].apply({1:'Male', 0:'Female'}.get)
Testing with timeit
:
测试timeit
:
%%timeit
data['sex'].replace([0,1],['Female','Male'],inplace=True)
Result:
结果:
The slowest run took 5.83 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 510 μs per loop
Using apply
:
使用apply
:
%%timeit
data['sex'] = data['sex'].apply({1:'Male', 0:'Female'}.get)
Result:
结果:
The slowest run took 5.92 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 331 μs per loop
Note:apply
with dictionary should be used if all the possible values of the columns in the dataframe are defined in the dictionary else, it will have empty for those not defined in dictionary.
注意:apply
如果数据框中列的所有可能值都在字典中定义,则应使用字典,否则字典中未定义的值将为空。