Python Pandas:如果数据为NaN,则更改为0,否则在数据框中更改为1
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38607381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: if the data is NaN, then change to be 0, else change to be 1 in data frame
提问by tktktk0711
I have a DataFrame:df as following:
我有一个 DataFrame:df 如下:
row id name age url
1 e1 tom NaN http1
2 e2 john 25 NaN
3 e3 lucy NaN http3
4 e4 tick 29 NaN
I want to change the NaN to be 0, else to be 1 in the columns: age, url. My code is following, but it is wrong.
我想在列中将 NaN 更改为 0,否则更改为 1:age、url。我的代码如下,但它是错误的。
import Pandas as pd
df[['age', 'url']].applymap(lambda x: 0 if x=='NaN' else x)
I want to get the following result:
我想得到以下结果:
row id name age url
1 e1 tom 0 1
2 e2 john 1 0
3 e3 lucy 0 1
4 e4 tick 1 0
Thanks for your help!
谢谢你的帮助!
采纳答案by jezrael
You can use where
with fillna
and condition by isnull
:
您可以使用where
withfillna
和条件isnull
:
df[['age', 'url']] = df[['age', 'url']].where(df[['age', 'url']].isnull(), 1)
.fillna(0).astype(int)
print (df)
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
Or numpy.where
with isnull
:
df[['age', 'url']] = np.where(df[['age', 'url']].isnull(), 0, 1)
print (df)
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
Fastest solution with notnull
and astype
:
df[['age', 'url']] = df[['age', 'url']].notnull().astype(int)
print (df)
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
EDIT:
编辑:
I try modify your solution:
我尝试修改您的解决方案:
df[['age', 'url']] = df[['age', 'url']].applymap(lambda x: 0 if pd.isnull(x) else 1)
print (df)
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0
Timings:
时间:
len(df)=4k
:
len(df)=4k
:
In [127]: %timeit df[['age', 'url']] = df[['age', 'url']].applymap(lambda x: 0 if pd.isnull(x) else 1)
100 loops, best of 3: 11.2 ms per loop
In [128]: %timeit df[['age', 'url']] = np.where(df[['age', 'url']].isnull(), 0, 1)
100 loops, best of 3: 2.69 ms per loop
In [129]: %timeit df[['age', 'url']] = np.where(pd.notnull(df[['age', 'url']]), 1, 0)
100 loops, best of 3: 2.78 ms per loop
In [131]: %timeit df.loc[:, ['age', 'url']] = df[['age', 'url']].notnull() * 1
1000 loops, best of 3: 1.45 ms per loop
In [136]: %timeit df[['age', 'url']] = df[['age', 'url']].notnull().astype(int)
1000 loops, best of 3: 1.01 ms per loop
回答by EdChum
Use np.where
with pd.notnull
to replace the missing and valid elements with 0
and 1
respectively:
使用np.where
with分别pd.notnull
用0
and替换缺失的和有效的元素1
:
In [90]:
df[['age', 'url']] = np.where(pd.notnull(df[['age', 'url']]), 1, 0)
df
Out[90]:
row id name age url
0 1 e1 tom 0 1
1 2 e2 john 1 0
2 3 e3 lucy 0 1
3 4 e4 tick 1 0