Python 试图删除数据框中的 NaN 索引行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19670904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:18:48  来源:igfitidea点击:

Trying to drop NaN indexed row in dataframe

pythonpandasdataframe

提问by Alison S

I'm using python 2.7.3 and Pandas version 0.12.0.

我正在使用 python 2.7.3 和 Pandas 版本 0.12.0。

I want to drop the row with the NaN index so that I only have valid site_id values.

我想删除带有 NaN 索引的行,以便我只有有效的 site_id 值。

print df.head()
            special_name
site_id
NaN          Banana
OMG          Apple

df.drop(df.index[0])

TypeError: 'NoneType' object is not iterable

If I try dropping a range, like this:

如果我尝试删除一个范围,如下所示:

df.drop(df.index[0:1])

I get this error:

我收到此错误:

AttributeError: 'DataFrame' object has no attribute 'special_name'

采纳答案by TomAugspurger

I've found that the easiest way is to reset the index, drop the NaNs, and then reset the index again.

我发现最简单的方法是重置索引,删除 NaN,然后​​再次重置索引。

In [26]: dfA.reset_index()
Out[26]: 
  index special_name
0   NaN        Apple
1   OMG       Banana

In [30]: df = dfA.reset_index().dropna().set_index('index')

In [31]: df
Out[31]: 
      special_name
index             
OMG         Banana

回答by Robert Muil

Edit: the following probably only applies to MultiIndexs, and is in any case obsoleted by the new df.index.isnull()function (see other answers). I'll leave this answer just for historical interest.

编辑:以下内容可能仅适用于MultiIndexs,并且在任何情况下都已被新df.index.isnull()功能废弃(请参阅其他答案)。我只是为了历史兴趣而留下这个答案。

For people who come to this now, one can do this directly without reindexing by relying on the fact that NaNs in the index will be represented with the label -1. So:

对于现在来到这里的人,可以通过依赖于索引中的 NaN 将用标签表示这一事实而直接执行此操作而无需重新索引-1。所以:

df = dfA[dfA.index.labels!=-1]

Even better, in Pandas>0.16.1, one can use drop() to do this inplace without copying:

更好的是,在 Pandas>0.16.1 中,可以使用 drop() 就地执行此操作而无需复制:

dfA.drop(labels=[-1], level='index', inplace=True)

NB: It's a bit misleading that the index level is called 'index': it would usually be something more use-specific like 'date' or 'experimental_run'..

注意:索引级别被称为“索引”有点误导:它通常是更特定于使用的东西,如“日期”或“实验运行”。

回答by timdiels

With pandas version >= 0.20.0 you can:

使用熊猫版本 >= 0.20.0,您可以:

df[df.index.notnull()]

df[df.index.notnull()]

With older versions:

使用旧版本:

df[pandas.notnull(df.index)]

df[pandas.notnull(df.index)]

To break it down:

分解:

notnullgenerates a boolean mask, e.g. [False, False, True], where True denotes the value at the corresponding position is null (numpy.nanor None). We then select the rows whose index corresponds to a true value in the mask by using df[boolean_mask].

notnull生成一个布尔掩码,例如[False, False, True],其中 True 表示相应位置的值为空(numpy.nanNone)。然后,我们使用 选择索引与掩码中的真值相对应的行df[boolean_mask]

回答by Mrumble

Tested this to be working :

测试这个工作:

df.reset_index(inplace=True)

df.reset_index(inplace=True)

df.drop(df[df['index'].isnull()].index, inplace=True)

df.drop(df[df['index'].isnull()].index, inplace=True)



How I checked the above

我是如何检查上述内容的

Replicated the table in the original question using df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])

使用复制原始问题中的表格 df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])

then input the above two code lines- which I try to explain in human language below:

然后输入上面两行代码——我试着用下面的人类语言来解释:

  • 1st line resets the index to integers, and the NaN is now in a column named after the original name of the index ('index' in the example above as there was no name specified) - pandas does this automatically with the reset_index() command.
  • 2nd line from innermost brackets: df[df['index'].isnull()]filters rows for which column named 'index' shows 'NaN' values using isnull() command. .indexis used to pass an unambiguous index object pointing to all 'index'=NaN rows to the df.drop(in the outermost part of the expression.
  • 第一行将索引重置为整数,现在 NaN 位于以索引的原始名称命名的列中(在上面的示例中为“index”,因为没有指定名称)-pandas 使用 reset_index() 命令自动执行此操作.
  • 最里面方括号的第 2 行:df[df['index'].isnull()]使用 isnull() 命令过滤名为“index”的列显示“NaN”值的行。.index用于将指向所有 'index'=NaN 行的明确索引对象传递df.drop(到表达式的最外层。

nb: tested the above command to work on multiple NaN values in a column

nb:测试了上述命令以处理列中的多个 NaN 值

Using Python 3.5.1 , Pandas 0.17.1 via Anaconda package 32bits

使用 Python 3.5.1 , Pandas 0.17.1 通过 Anaconda 包 32bits

回答by Joakim

None of the answers worked 100% for me. Here's what worked:

没有一个答案对我来说是 100% 的。这是有效的:

In [26]: print df
Out[26]:            
          site_id      special_name
0         OMG          Apple
1         NaN          Banana
2         RLY          Orange


In [27]: df.dropna(inplace=True)
Out[27]:            
          site_id      special_name
0         OMG          Apple
2         RLY          Orange

In [28]: df.reset_index(inplace=True)
Out[28]:            
          index     site_id      special_name
0         0         OMG          Apple
1         2         RLY          Orange

In [29]: df.drop('index', axis='columns', inplace=True)
Out[29]:             
          site_id      special_name
0         OMG          Apple
1         RLY          Orange

回答by Pietro Battiston

As of pandas0.19, Indexes do have a .notnull()method, so the answer by timdielscan be simplified to:

pandas0.19 开始,Indexes 确实有.notnull()方法,因此可以将timdiels的答案简化为:

df[df.index.notnull()]

which I think is (currently) the simplest you can get.

我认为这是(目前)你能得到的最简单的。