pandas 如何用 1 替换数据帧的所有非 NaN 条目，用 0 替换所有 NaN

Question

提问by Anirban De

I have a dataframe with 71 columns and 30597 rows. I want to replace all non-nan entries with 1 and the nan values with 0.

我有一个包含 71 列和 30597 行的数据框。我想用 1 替换所有非 nan 条目，用 0 替换 nan 值。

Initially I tried for-loop on each value of the dataframe which was taking too much time.

最初，我尝试对数据帧的每个值进行 for 循环，这花费了太多时间。

Then I used data_new=data.subtract(data)which was meant to subtract all the values of the dataframe to itself so that I can make all the non-null values 0. But an error occurred as the dataframe had multiple string entries.

然后我使用了data_new=data.subtract(data)这意味着将数据帧的所有值减去自身，以便我可以将所有非空值设为 0。但是由于数据帧有多个字符串条目而发生错误。

Answer 1

回答by fmarc

You can take the return value of df.notnull(), which is Falsewhere the DataFrame contains NaNand Trueotherwise and cast it to integer, giving you 0where the DataFrame is NaNand 1otherwise:

您可以获取的返回值df.notnull()，即FalseDataFrame 包含的位置NaN，True否则将其转换为整数，从而为您提供0DataFrame 所在的位置NaN，1否则：

newdf = df.notnull().astype('int')

If you really want to write into your original DataFrame, this will work:

如果您真的想写入原始数据帧，这将起作用：

df.loc[~df.isnull()] = 1  # not nan
df.loc[df.isnull()] = 0   # nan

Answer 2

回答by jezrael

Use notnullwith casting boolean to intby astype:

使用notnull与铸造布尔值，int通过astype：

print ((df.notnull()).astype('int'))

Sample:

样本：

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [np.nan, 4, np.nan], 'b': [1,np.nan,3]})
print (df)
     a    b
0  NaN  1.0
1  4.0  NaN
2  NaN  3.0

print (df.notnull())
       a      b
0  False   True
1   True  False
2  False   True

print ((df.notnull()).astype('int'))
   a  b
0  0  1
1  1  0
2  0  1

Answer 3

回答by tnknepp

I do a lot of data analysis and am interested in finding new/faster methods of carrying out operations. I had never come across jezrael's method, so I was curious to compare it with my usual method (i.e. replace by indexing). NOTE: This is not an answer to the OP's question, rather it is an illustration of the efficiency of jezrael's method. Since this is NOT an answer I will remove this post if people do not find it useful (and after being downvoted into oblivion!). Just leave a comment if you think I should remove it.

我进行了大量数据分析，并且有兴趣寻找新的/更快的执行操作方法。我从未遇到过 jezrael 的方法，所以我很好奇将它与我常用的方法（即用索引替换）进行比较。注意：这不是对 OP 问题的回答，而是对 jezrael 方法效率的说明。由于这不是一个答案，如果人们认为它没有用（并且在被低估之后被遗忘！），我将删除这篇文章。如果您认为我应该删除它，请发表评论。

I created a moderately sized dataframe and did multiple replacements using both the df.notnull().astype(int) method and simple indexing (how I would normally do this). It turns out that the latter is slower by approximately five times. Just an fyi for anyone doing larger-scale replacements.

我创建了一个中等大小的数据框，并使用 df.notnull().astype(int) 方法和简单的索引（我通常会这样做）进行了多次替换。事实证明，后者慢了大约五倍。对于任何进行大规模更换的人来说，仅供参考。

from __future__ import division, print_function

import numpy as np
import pandas as pd
import datetime as dt


# create dataframe with randomly place NaN's
data = np.ones( (1e2,1e2) )
data.ravel()[np.random.choice(data.size,data.size/10,replace=False)] = np.nan

df = pd.DataFrame(data=data)

trials = np.arange(100)


d1 = dt.datetime.now()

for r in trials:
    new_df = df.notnull().astype(int)

print( (dt.datetime.now()-d1).total_seconds()/trials.size )


# create a dummy copy of df.  I use a dummy copy here to prevent biasing the 
# time trial with dataframe copies/creations within the upcoming loop
df_dummy = df.copy()

d1 = dt.datetime.now()

for r in trials:
    df_dummy[df.isnull()] = 0
    df_dummy[df.isnull()==False] = 1

print( (dt.datetime.now()-d1).total_seconds()/trials.size )

This yields times of 0.142 s and 0.685 s respectively. It is clear who the winner is.

这分别产生 0.142 秒和 0.685 秒的时间。谁是赢家，一目了然。

Answer 4

回答by DainDwarf

There is a method .fillna()on DataFrames which does what you need. For example:

.fillna()DataFrames 上有一种方法可以满足您的需求。例如：

df = df.fillna(0)  # Replace all NaN values with zero, returning the modified DataFrame

or

或者

df.fillna(0, inplace=True)   # Replace all NaN values with zero, updating the DataFrame directly

Answer 5

回答by tompiler

I'd advise making a new column rather than just replacing. You can always delete the previous column if necessary but its always helpful to have a source for a column populated via an operation on another.

我建议制作一个新的专栏，而不是仅仅更换。如有必要，您始终可以删除前一列，但通过对另一列的操作填充列的源总是有帮助的。

e.g. if df['col1'] is the existing column

例如，如果 df['col1'] 是现有列

df['col2'] = df['col1'].apply(lambda x: 1 if not pd.isnull(x) else np.nan)

where col2 is the new column. Should also work if col2 has string entries.

其中 col2 是新列。如果 col2 有字符串条目，也应该工作。

Answer 6

回答by Xin Niu

for fmarc 's answer:

对于 fmarc 的回答：

df.loc[~df.isnull()] = 1  # not nan
df.loc[df.isnull()] = 0   # nan

The code above does not work for me, and the below works.

上面的代码对我不起作用，下面的代码有效。

df[~df.isnull()] = 1  # not nan
df[df.isnull()] = 0   # nan

With the pandas 0.25.3

与Pandas 0.25.3

And if you want to just change values in specific columns, you may need to create a temp dataframe and assign it to the columns of the original dataframe:

如果您只想更改特定列中的值，您可能需要创建一个临时数据框并将其分配给原始数据框的列：

change_col = ['a', 'b']
tmp = df[change_col]
tmp[tmp.isnull()]='xxx'
df[change_col]=tmp

Answer 7

回答by afuc func

Use: df.fillna(0)

用： df.fillna(0)

to fill NaN with 0.

用 0 填充 NaN。

Answer 8

回答by arshad

Here i will give a suggestion to take a particular column and if the rows in that column is NaN replace it by 0 or values are there in that column replace it as 1

在这里，我将建议采用特定列，如果该列中的行是 NaN，则将其替换为 0 或该列中有值将其替换为 1

this below line will change your column to 0

下面这行会将您的列更改为 0

df.YourColumnName.fillna(0,inplace=True)

Now Rest of the Not Nan Part will be Replace by 1 by below code

现在非南部分的其余部分将被以下代码替换为 1

df["YourColumnName"]=df["YourColumnName"].apply(lambda x: 1 if x!=0 else 0)

Same Can Be applied to the total dataframe by not defining the column Name

同样可以通过不定义列名称应用于总数据框

pandas 如何用 1 替换数据帧的所有非 NaN 条目，用 0 替换所有 NaN

提问by Anirban De

回答by fmarc

回答by jezrael

回答by tnknepp

回答by DainDwarf

回答by tompiler

回答by Xin Niu

回答by afuc func

回答by arshad

相关推荐

最近更新

标签

pandas 如何用 1 替换数据帧的所有非 NaN 条目，用 0 替换所有 NaN

提问by Anirban De

回答by fmarc

回答by jezrael

回答by tnknepp

回答by DainDwarf

回答by tompiler

回答by Xin Niu

回答by afuc func

回答by arshad

相关推荐

pandas 如何在python上过滤数据透视表

pandas 用“符号”数字填充数据帧

Pandas 使用 bool 过滤 DataFrame 的列

pandas 熊猫在没有标题的txt文件中读取

相关推荐

最近更新

标签