Python 不能在 Pandas 中使用 dropna 删除 NAN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33643843/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:40:59  来源:igfitidea点击:

Can't drop NAN with dropna in pandas

pythonpandasdataframemissing-data

提问by fangh

I import pandas as pd and run the code below and get the following result

我将熊猫导入为 pd 并运行下面的代码并得到以下结果

Code:

代码:

traindataset = pd.read_csv('/Users/train.csv')
print traindataset.dtypes
print traindataset.shape
print traindataset.iloc[25,3]
traindataset.dropna(how='any')
print traindataset.iloc[25,3]
print traindataset.shape

Output

输出

TripType                   int64  
VisitNumber                int64  
Weekday                   object  
Upc                      float64  
ScanCount                  int64  
DepartmentDescription     object  
FinelineNumber           float64  
dtype: object

(647054, 7)

nan  
nan

(647054, 7) 
[Finished in 2.2s]

From the result, the dropna line doesn't work because the row number doesn't change and there is still NAN in the dataframe. How that comes? I am craaaazy right now.

从结果来看,dropna 行不起作用,因为行号没有改变并且数据帧中仍然存在 NAN。这是怎么来的?我现在很疯狂。

回答by BrenBarn

You need to read the documentation(emphasis added):

您需要阅读文档(重点添加):

Returnobject with labels on given axis omitted

返回在给定轴上带有标签的对象被省略

dropnareturnsa newDataFrame. If you want it to modify the existing DataFrame, all you have to do is read further in the documentation:

dropna返回一个新的数据帧。如果您希望它修改现有的 DataFrame,您所要做的就是在文档中进一步阅读:

inplace: boolean, default False

If True, do operation inplace and return None.

就地:布尔值,默认为 False

如果为 True,就地执行操作并返回 None。

So to modify it in place, do traindataset.dropna(how='any', inplace=True).

因此,要就地修改它,请执行traindataset.dropna(how='any', inplace=True).

回答by Himanshi Dixit

Alternatively, you can also use notnull()method to select the rows which are not null.

或者,您也可以使用notnull()method 选择不是null.

For example if you want to select Non nullvalues from columns countryand varietyof the dataframe reviews:

例如,如果null要从列countryvariety数据框评论中选择非值:

answer=reviews.loc[(reviews.country.notnull()) & (reviews.variety.notnull())]

But here we are just selecting relevant data;to remove nullvalues you should use dropna()method.

但这里我们只是选择相关数据;要删除null值,您应该使用dropna()方法。

回答by jpp

pd.DataFrame.dropnauses inplace=Falseby default. This is the norm with mostPandas operations; exceptions do exist, e.g. update.

pd.DataFrame.dropnainplace=False默认使用。这是大多数Pandas 操作的规范;例外确实存在,例如update

Therefore, you must either assign back to your variable, orstate explicitly inplace=True:

因此,您必须要么分配回您的变量,要么明确声明inplace=True

df = df.dropna(how='any')           # assign back
df.dropna(how='any', inplace=True)  # set inplace parameter

Stylistically, the former is often preferred as it supports operator chaining, and the latter often does not yield any or significant performance benefits.

在风格上,前者通常是首选,因为它支持运算符链接,而后者通常不会产生任何或显着的性能优势

回答by Robert Forderer

This is my first post. I just spent a few hours debugging this exact issue and I would like to share how I fixed this issue.

这是我的第一篇文章。我只花了几个小时调试这个确切的问题,我想分享我如何解决这个问题。

I was converting my entire dataframe to a string and then placing that value back into the dataframe using similar code to what is displayed below: (please note, the code below will only convert the value to a string)

我正在将整个数据帧转换为字符串,然后使用与下面显示的代码类似的代码将该值放回数据帧中:(请注意,下面的代码只会将值转换为字符串)

row_counter = 0
for ind, row in dataf.iterrows():
    cell_value = str(row['column_header'])
    dataf.loc[row_counter, 'column_header'] = cell_value
    row_counter += 1

After converting the entire dataframe to a string, I then used the dropna()function. The values that were previously NaN(considered a null value by pandas) were converted to the string 'nan'.

将整个数据帧转换为字符串后,我使用了该dropna()函数。之前的值NaN(被熊猫视为空值)被转换为字符串'nan'

In conclusion, drop blank values FIRST, before you start manipulating data in the CSV and converting its data type.

总之,在开始处理 CSV 中的数据并转换其数据类型之前,首先删除空白值。