Python 试图删除数据框中的 NaN 索引行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19670904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Trying to drop NaN indexed row in dataframe
提问by Alison S
I'm using python 2.7.3 and Pandas version 0.12.0.
我正在使用 python 2.7.3 和 Pandas 版本 0.12.0。
I want to drop the row with the NaN index so that I only have valid site_id values.
我想删除带有 NaN 索引的行,以便我只有有效的 site_id 值。
print df.head()
special_name
site_id
NaN Banana
OMG Apple
df.drop(df.index[0])
TypeError: 'NoneType' object is not iterable
If I try dropping a range, like this:
如果我尝试删除一个范围,如下所示:
df.drop(df.index[0:1])
I get this error:
我收到此错误:
AttributeError: 'DataFrame' object has no attribute 'special_name'
采纳答案by TomAugspurger
I've found that the easiest way is to reset the index, drop the NaNs, and then reset the index again.
我发现最简单的方法是重置索引,删除 NaN,然后再次重置索引。
In [26]: dfA.reset_index()
Out[26]:
index special_name
0 NaN Apple
1 OMG Banana
In [30]: df = dfA.reset_index().dropna().set_index('index')
In [31]: df
Out[31]:
special_name
index
OMG Banana
回答by Robert Muil
Edit: the following probably only applies to MultiIndexs, and is in any case obsoleted by the new df.index.isnull()function (see other answers). I'll leave this answer just for historical interest.
编辑:以下内容可能仅适用于MultiIndexs,并且在任何情况下都已被新df.index.isnull()功能废弃(请参阅其他答案)。我只是为了历史兴趣而留下这个答案。
For people who come to this now, one can do this directly without reindexing by relying on the fact that NaNs in the index will be represented with the label -1. So:
对于现在来到这里的人,可以通过依赖于索引中的 NaN 将用标签表示这一事实而直接执行此操作而无需重新索引-1。所以:
df = dfA[dfA.index.labels!=-1]
Even better, in Pandas>0.16.1, one can use drop() to do this inplace without copying:
更好的是,在 Pandas>0.16.1 中,可以使用 drop() 就地执行此操作而无需复制:
dfA.drop(labels=[-1], level='index', inplace=True)
NB: It's a bit misleading that the index level is called 'index': it would usually be something more use-specific like 'date' or 'experimental_run'..
注意:索引级别被称为“索引”有点误导:它通常是更特定于使用的东西,如“日期”或“实验运行”。
回答by timdiels
With pandas version >= 0.20.0 you can:
使用熊猫版本 >= 0.20.0,您可以:
df[df.index.notnull()]
df[df.index.notnull()]
With older versions:
使用旧版本:
df[pandas.notnull(df.index)]
df[pandas.notnull(df.index)]
To break it down:
分解:
notnullgenerates a boolean mask, e.g. [False, False, True], where True denotes the value at the corresponding position is null (numpy.nanor None). We then select the rows whose index corresponds to a true value in the mask by using df[boolean_mask].
notnull生成一个布尔掩码,例如[False, False, True],其中 True 表示相应位置的值为空(numpy.nan或None)。然后,我们使用 选择索引与掩码中的真值相对应的行df[boolean_mask]。
回答by Mrumble
Tested this to be working :
测试这个工作:
df.reset_index(inplace=True)
df.reset_index(inplace=True)
df.drop(df[df['index'].isnull()].index, inplace=True)
df.drop(df[df['index'].isnull()].index, inplace=True)
How I checked the above
我是如何检查上述内容的
Replicated the table in the original question using
df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])
使用复制原始问题中的表格
df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])
then input the above two code lines- which I try to explain in human language below:
然后输入上面两行代码——我试着用下面的人类语言来解释:
- 1st line resets the index to integers, and the NaN is now in a column named after the original name of the index ('index' in the example above as there was no name specified) - pandas does this automatically with the reset_index() command.
- 2nd line from innermost brackets:
df[df['index'].isnull()]filters rows for which column named 'index' shows 'NaN' values using isnull() command..indexis used to pass an unambiguous index object pointing to all 'index'=NaN rows to thedf.drop(in the outermost part of the expression.
- 第一行将索引重置为整数,现在 NaN 位于以索引的原始名称命名的列中(在上面的示例中为“index”,因为没有指定名称)-pandas 使用 reset_index() 命令自动执行此操作.
- 最里面方括号的第 2 行:
df[df['index'].isnull()]使用 isnull() 命令过滤名为“index”的列显示“NaN”值的行。.index用于将指向所有 'index'=NaN 行的明确索引对象传递df.drop(到表达式的最外层。
nb: tested the above command to work on multiple NaN values in a column
nb:测试了上述命令以处理列中的多个 NaN 值
Using Python 3.5.1 , Pandas 0.17.1 via Anaconda package 32bits
使用 Python 3.5.1 , Pandas 0.17.1 通过 Anaconda 包 32bits
回答by Joakim
None of the answers worked 100% for me. Here's what worked:
没有一个答案对我来说是 100% 的。这是有效的:
In [26]: print df
Out[26]:
site_id special_name
0 OMG Apple
1 NaN Banana
2 RLY Orange
In [27]: df.dropna(inplace=True)
Out[27]:
site_id special_name
0 OMG Apple
2 RLY Orange
In [28]: df.reset_index(inplace=True)
Out[28]:
index site_id special_name
0 0 OMG Apple
1 2 RLY Orange
In [29]: df.drop('index', axis='columns', inplace=True)
Out[29]:
site_id special_name
0 OMG Apple
1 RLY Orange
回答by Pietro Battiston
As of pandas0.19, Indexes do have a .notnull()method, so the answer by timdielscan be simplified to:
从pandas0.19 开始,Indexes 确实有.notnull()方法,因此可以将timdiels的答案简化为:
df[df.index.notnull()]
which I think is (currently) the simplest you can get.
我认为这是(目前)你能得到的最简单的。

