pandas 如何用 1 替换数据帧的所有非 NaN 条目,用 0 替换所有 NaN
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37543647/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to replace all non-NaN entries of a dataframe with 1 and all NaN with 0
提问by Anirban De
I have a dataframe with 71 columns and 30597 rows. I want to replace all non-nan entries with 1 and the nan values with 0.
我有一个包含 71 列和 30597 行的数据框。我想用 1 替换所有非 nan 条目,用 0 替换 nan 值。
Initially I tried for-loop on each value of the dataframe which was taking too much time.
最初,我尝试对数据帧的每个值进行 for 循环,这花费了太多时间。
Then I used data_new=data.subtract(data)which was meant to subtract all the values of the dataframe to itself so that I can make all the non-null values 0. But an error occurred as the dataframe had multiple string entries.
然后我使用了data_new=data.subtract(data)这意味着将数据帧的所有值减去自身,以便我可以将所有非空值设为 0。但是由于数据帧有多个字符串条目而发生错误。
回答by fmarc
You can take the return value of df.notnull()
, which is False
where the DataFrame contains NaN
and True
otherwise and cast it to integer, giving you 0
where the DataFrame is NaN
and 1
otherwise:
您可以获取 的返回值df.notnull()
,即False
DataFrame 包含的位置NaN
,True
否则将其转换为整数,从而为您提供0
DataFrame 所在的位置NaN
,1
否则:
newdf = df.notnull().astype('int')
If you really want to write into your original DataFrame, this will work:
如果您真的想写入原始数据帧,这将起作用:
df.loc[~df.isnull()] = 1 # not nan
df.loc[df.isnull()] = 0 # nan
回答by jezrael
Use notnull
with casting boolean to int
by astype
:
print ((df.notnull()).astype('int'))
Sample:
样本:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [np.nan, 4, np.nan], 'b': [1,np.nan,3]})
print (df)
a b
0 NaN 1.0
1 4.0 NaN
2 NaN 3.0
print (df.notnull())
a b
0 False True
1 True False
2 False True
print ((df.notnull()).astype('int'))
a b
0 0 1
1 1 0
2 0 1
回答by tnknepp
I do a lot of data analysis and am interested in finding new/faster methods of carrying out operations. I had never come across jezrael's method, so I was curious to compare it with my usual method (i.e. replace by indexing). NOTE: This is not an answer to the OP's question, rather it is an illustration of the efficiency of jezrael's method. Since this is NOT an answer I will remove this post if people do not find it useful (and after being downvoted into oblivion!). Just leave a comment if you think I should remove it.
我进行了大量数据分析,并且有兴趣寻找新的/更快的执行操作方法。我从未遇到过 jezrael 的方法,所以我很好奇将它与我常用的方法(即用索引替换)进行比较。注意:这不是对 OP 问题的回答,而是对 jezrael 方法效率的说明。由于这不是一个答案,如果人们认为它没有用(并且在被低估之后被遗忘!),我将删除这篇文章。如果您认为我应该删除它,请发表评论。
I created a moderately sized dataframe and did multiple replacements using both the df.notnull().astype(int) method and simple indexing (how I would normally do this). It turns out that the latter is slower by approximately five times. Just an fyi for anyone doing larger-scale replacements.
我创建了一个中等大小的数据框,并使用 df.notnull().astype(int) 方法和简单的索引(我通常会这样做)进行了多次替换。事实证明,后者慢了大约五倍。对于任何进行大规模更换的人来说,仅供参考。
from __future__ import division, print_function
import numpy as np
import pandas as pd
import datetime as dt
# create dataframe with randomly place NaN's
data = np.ones( (1e2,1e2) )
data.ravel()[np.random.choice(data.size,data.size/10,replace=False)] = np.nan
df = pd.DataFrame(data=data)
trials = np.arange(100)
d1 = dt.datetime.now()
for r in trials:
new_df = df.notnull().astype(int)
print( (dt.datetime.now()-d1).total_seconds()/trials.size )
# create a dummy copy of df. I use a dummy copy here to prevent biasing the
# time trial with dataframe copies/creations within the upcoming loop
df_dummy = df.copy()
d1 = dt.datetime.now()
for r in trials:
df_dummy[df.isnull()] = 0
df_dummy[df.isnull()==False] = 1
print( (dt.datetime.now()-d1).total_seconds()/trials.size )
This yields times of 0.142 s and 0.685 s respectively. It is clear who the winner is.
这分别产生 0.142 秒和 0.685 秒的时间。谁是赢家,一目了然。
回答by DainDwarf
There is a method .fillna()
on DataFrames which does what you need. For example:
.fillna()
DataFrames 上有一种方法可以满足您的需求。例如:
df = df.fillna(0) # Replace all NaN values with zero, returning the modified DataFrame
or
或者
df.fillna(0, inplace=True) # Replace all NaN values with zero, updating the DataFrame directly
回答by tompiler
I'd advise making a new column rather than just replacing. You can always delete the previous column if necessary but its always helpful to have a source for a column populated via an operation on another.
我建议制作一个新的专栏,而不是仅仅更换。如有必要,您始终可以删除前一列,但通过对另一列的操作填充列的源总是有帮助的。
e.g. if df['col1'] is the existing column
例如,如果 df['col1'] 是现有列
df['col2'] = df['col1'].apply(lambda x: 1 if not pd.isnull(x) else np.nan)
where col2 is the new column. Should also work if col2 has string entries.
其中 col2 是新列。如果 col2 有字符串条目,也应该工作。
回答by Xin Niu
for fmarc 's answer:
对于 fmarc 的回答:
df.loc[~df.isnull()] = 1 # not nan
df.loc[df.isnull()] = 0 # nan
The code above does not work for me, and the below works.
上面的代码对我不起作用,下面的代码有效。
df[~df.isnull()] = 1 # not nan
df[df.isnull()] = 0 # nan
With the pandas 0.25.3
与Pandas 0.25.3
And if you want to just change values in specific columns, you may need to create a temp dataframe and assign it to the columns of the original dataframe:
如果您只想更改特定列中的值,您可能需要创建一个临时数据框并将其分配给原始数据框的列:
change_col = ['a', 'b']
tmp = df[change_col]
tmp[tmp.isnull()]='xxx'
df[change_col]=tmp
回答by afuc func
Use: df.fillna(0)
用: df.fillna(0)
to fill NaN with 0.
用 0 填充 NaN。
回答by arshad
Here i will give a suggestion to take a particular column and if the rows in that column is NaN replace it by 0 or values are there in that column replace it as 1
在这里,我将建议采用特定列,如果该列中的行是 NaN,则将其替换为 0 或该列中有值将其替换为 1
this below line will change your column to 0
下面这行会将您的列更改为 0
df.YourColumnName.fillna(0,inplace=True)
Now Rest of the Not Nan Part will be Replace by 1 by below code
现在非南部分的其余部分将被以下代码替换为 1
df["YourColumnName"]=df["YourColumnName"].apply(lambda x: 1 if x!=0 else 0)
Same Can Be applied to the total dataframe by not defining the column Name
同样可以通过不定义列名称应用于总数据框