如何根据另一列的 NaN 值在 Pandas 数据框中设置值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37962759/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How set values in pandas dataframe based on NaN values of another column?
提问by Rocketq
I have dataframe named df
with original shape (4361, 15)
. Some of agefm
column`s values are NaN. Just look:
我有以df
原始形状命名的数据框(4361, 15)
。一些agefm
列的值是 NaN。只是看看:
> df[df.agefm.isnull() == True].agefm.shape
(2282,)
Then I create new column and set all its values to 0:
然后我创建新列并将其所有值设置为 0:
df['nevermarr'] = 0
So I would like to set nevermarr
value to 1, then in that row agefm
is Nan:
所以我想将nevermarr
值设置为 1,然后在那一行agefm
是 Nan:
df[df.agefm.isnull() == True].nevermarr = 1
Nothing changed:
没有改变:
> df['nevermarr'].sum()
0
What am I doing wrong?
我究竟做错了什么?
回答by jezrael
The best is use numpy.where
:
最好是使用numpy.where
:
df['nevermarr'] = np.where(df.agefm.isnull(), 1, 0)
print (df)
agefm nevermarr
0 NaN 1
1 5.0 0
2 6.0 0
Or use loc
, ==True
can be omitted:
或者使用loc
,==True
可以省略:
df.loc[df.agefm.isnull(), 'nevermarr'] = 1
Or mask
:
或mask
:
df['nevermarr'] = df.nevermarr.mask(df.agefm.isnull(), 1)
print (df)
agefm nevermarr
0 NaN 1
1 5.0 2
2 6.0 3
Sample:
样本:
import pandas as pd
import numpy as np
df = pd.DataFrame({'nevermarr':[7,2,3],
'agefm':[np.nan,5,6]})
print (df)
agefm nevermarr
0 NaN 7
1 5.0 2
2 6.0 3
df.loc[df.agefm.isnull(), 'nevermarr'] = 1
print (df)
agefm nevermarr
0 NaN 1
1 5.0 2
2 6.0 3