Python 试图删除数据框中的 NaN 索引行

Question

提问by Alison S

I'm using python 2.7.3 and Pandas version 0.12.0.

我正在使用 python 2.7.3 和 Pandas 版本 0.12.0。

I want to drop the row with the NaN index so that I only have valid site_id values.

我想删除带有 NaN 索引的行，以便我只有有效的 site_id 值。

print df.head()
            special_name
site_id
NaN          Banana
OMG          Apple

df.drop(df.index[0])

TypeError: 'NoneType' object is not iterable

If I try dropping a range, like this:

如果我尝试删除一个范围，如下所示：

df.drop(df.index[0:1])

I get this error:

我收到此错误：

AttributeError: 'DataFrame' object has no attribute 'special_name'

Answer 1

采纳答案by TomAugspurger

I've found that the easiest way is to reset the index, drop the NaNs, and then reset the index again.

我发现最简单的方法是重置索引，删除 NaN，然后再次重置索引。

In [26]: dfA.reset_index()
Out[26]: 
  index special_name
0   NaN        Apple
1   OMG       Banana

In [30]: df = dfA.reset_index().dropna().set_index('index')

In [31]: df
Out[31]: 
      special_name
index             
OMG         Banana

Answer 2

回答by Robert Muil

Edit: the following probably only applies to MultiIndexs, and is in any case obsoleted by the new df.index.isnull()function (see other answers). I'll leave this answer just for historical interest.

编辑：以下内容可能仅适用于MultiIndexs，并且在任何情况下都已被新df.index.isnull()功能废弃（请参阅其他答案）。我只是为了历史兴趣而留下这个答案。

For people who come to this now, one can do this directly without reindexing by relying on the fact that NaNs in the index will be represented with the label -1. So:

对于现在来到这里的人，可以通过依赖于索引中的 NaN 将用标签表示这一事实而直接执行此操作而无需重新索引-1。所以：

df = dfA[dfA.index.labels!=-1]

Even better, in Pandas>0.16.1, one can use drop() to do this inplace without copying:

更好的是，在 Pandas>0.16.1 中，可以使用 drop() 就地执行此操作而无需复制：

dfA.drop(labels=[-1], level='index', inplace=True)

NB: It's a bit misleading that the index level is called 'index': it would usually be something more use-specific like 'date' or 'experimental_run'..

注意：索引级别被称为“索引”有点误导：它通常是更特定于使用的东西，如“日期”或“实验运行”。

Answer 3

回答by timdiels

With pandas version >= 0.20.0 you can:

使用熊猫版本 >= 0.20.0，您可以：

df[df.index.notnull()]

With older versions:

使用旧版本：

df[pandas.notnull(df.index)]

To break it down:

分解：

notnullgenerates a boolean mask, e.g. [False, False, True], where True denotes the value at the corresponding position is null (numpy.nanor None). We then select the rows whose index corresponds to a true value in the mask by using df[boolean_mask].

notnull生成一个布尔掩码，例如[False, False, True]，其中 True 表示相应位置的值为空（numpy.nan或None）。然后，我们使用选择索引与掩码中的真值相对应的行df[boolean_mask]。

Answer 4

回答by Mrumble

Tested this to be working :

测试这个工作：

df.reset_index(inplace=True)

df.drop(df[df['index'].isnull()].index, inplace=True)

How I checked the above

我是如何检查上述内容的

Replicated the table in the original question using df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])

使用复制原始问题中的表格 df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])

then input the above two code lines- which I try to explain in human language below:

然后输入上面两行代码——我试着用下面的人类语言来解释：

1st line resets the index to integers, and the NaN is now in a column named after the original name of the index ('index' in the example above as there was no name specified) - pandas does this automatically with the reset_index() command.
2nd line from innermost brackets: df[df['index'].isnull()]filters rows for which column named 'index' shows 'NaN' values using isnull() command. .indexis used to pass an unambiguous index object pointing to all 'index'=NaN rows to the df.drop(in the outermost part of the expression.

第一行将索引重置为整数，现在 NaN 位于以索引的原始名称命名的列中（在上面的示例中为“index”，因为没有指定名称）-pandas 使用 reset_index() 命令自动执行此操作.
最里面方括号的第 2 行：df[df['index'].isnull()]使用 isnull() 命令过滤名为“index”的列显示“NaN”值的行。.index用于将指向所有 'index'=NaN 行的明确索引对象传递df.drop(到表达式的最外层。

nb: tested the above command to work on multiple NaN values in a column

nb：测试了上述命令以处理列中的多个 NaN 值

Using Python 3.5.1 , Pandas 0.17.1 via Anaconda package 32bits

使用 Python 3.5.1 , Pandas 0.17.1 通过 Anaconda 包 32bits

Answer 5

回答by Joakim

None of the answers worked 100% for me. Here's what worked:

没有一个答案对我来说是 100% 的。这是有效的：

In [26]: print df
Out[26]:            
          site_id      special_name
0         OMG          Apple
1         NaN          Banana
2         RLY          Orange


In [27]: df.dropna(inplace=True)
Out[27]:            
          site_id      special_name
0         OMG          Apple
2         RLY          Orange

In [28]: df.reset_index(inplace=True)
Out[28]:            
          index     site_id      special_name
0         0         OMG          Apple
1         2         RLY          Orange

In [29]: df.drop('index', axis='columns', inplace=True)
Out[29]:             
          site_id      special_name
0         OMG          Apple
1         RLY          Orange

Answer 6

回答by Pietro Battiston

As of pandas0.19, Indexes do have a .notnull()method, so the answer by timdielscan be simplified to:

从pandas0.19 开始，Indexes 确实有.notnull()方法，因此可以将timdiels的答案简化为：

df[df.index.notnull()]

which I think is (currently) the simplest you can get.

我认为这是（目前）你能得到的最简单的。

Python 试图删除数据框中的 NaN 索引行

提问by Alison S

采纳答案by TomAugspurger

回答by Robert Muil

回答by timdiels

回答by Mrumble

How I checked the above

我是如何检查上述内容的

回答by Joakim

回答by Pietro Battiston

相关推荐

最近更新

标签

Python 试图删除数据框中的 NaN 索引行

提问by Alison S

采纳答案by TomAugspurger

回答by Robert Muil

回答by timdiels

回答by Mrumble

How I checked the above

我是如何检查上述内容的

回答by Joakim

回答by Pietro Battiston

相关推荐

Python 使用熊猫删除一列中的非数字行

Python - 在运行相同应用程序的网络上获取计算机的 IP 地址和主机名

Python Pyspark 数据框：对一列求和，同时对另一列进行分组

Python将元组转换为字符串

相关推荐

最近更新

标签