Python 试图删除数据框中的 NaN 索引行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19670904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Trying to drop NaN indexed row in dataframe
提问by Alison S
I'm using python 2.7.3 and Pandas version 0.12.0.
我正在使用 python 2.7.3 和 Pandas 版本 0.12.0。
I want to drop the row with the NaN index so that I only have valid site_id values.
我想删除带有 NaN 索引的行,以便我只有有效的 site_id 值。
print df.head()
special_name
site_id
NaN Banana
OMG Apple
df.drop(df.index[0])
TypeError: 'NoneType' object is not iterable
If I try dropping a range, like this:
如果我尝试删除一个范围,如下所示:
df.drop(df.index[0:1])
I get this error:
我收到此错误:
AttributeError: 'DataFrame' object has no attribute 'special_name'
采纳答案by TomAugspurger
I've found that the easiest way is to reset the index, drop the NaNs, and then reset the index again.
我发现最简单的方法是重置索引,删除 NaN,然后再次重置索引。
In [26]: dfA.reset_index()
Out[26]:
index special_name
0 NaN Apple
1 OMG Banana
In [30]: df = dfA.reset_index().dropna().set_index('index')
In [31]: df
Out[31]:
special_name
index
OMG Banana
回答by Robert Muil
Edit: the following probably only applies to MultiIndex
s, and is in any case obsoleted by the new df.index.isnull()
function (see other answers). I'll leave this answer just for historical interest.
编辑:以下内容可能仅适用于MultiIndex
s,并且在任何情况下都已被新df.index.isnull()
功能废弃(请参阅其他答案)。我只是为了历史兴趣而留下这个答案。
For people who come to this now, one can do this directly without reindexing by relying on the fact that NaNs in the index will be represented with the label -1
. So:
对于现在来到这里的人,可以通过依赖于索引中的 NaN 将用标签表示这一事实而直接执行此操作而无需重新索引-1
。所以:
df = dfA[dfA.index.labels!=-1]
Even better, in Pandas>0.16.1, one can use drop() to do this inplace without copying:
更好的是,在 Pandas>0.16.1 中,可以使用 drop() 就地执行此操作而无需复制:
dfA.drop(labels=[-1], level='index', inplace=True)
NB: It's a bit misleading that the index level is called 'index': it would usually be something more use-specific like 'date' or 'experimental_run'..
注意:索引级别被称为“索引”有点误导:它通常是更特定于使用的东西,如“日期”或“实验运行”。
回答by timdiels
With pandas version >= 0.20.0 you can:
使用熊猫版本 >= 0.20.0,您可以:
df[df.index.notnull()]
df[df.index.notnull()]
With older versions:
使用旧版本:
df[pandas.notnull(df.index)]
df[pandas.notnull(df.index)]
To break it down:
分解:
notnull
generates a boolean mask, e.g. [False, False, True]
, where True denotes the value at the corresponding position is null (numpy.nan
or None
). We then select the rows whose index corresponds to a true value in the mask by using df[boolean_mask]
.
notnull
生成一个布尔掩码,例如[False, False, True]
,其中 True 表示相应位置的值为空(numpy.nan
或None
)。然后,我们使用 选择索引与掩码中的真值相对应的行df[boolean_mask]
。
回答by Mrumble
Tested this to be working :
测试这个工作:
df.reset_index(inplace=True)
df.reset_index(inplace=True)
df.drop(df[df['index'].isnull()].index, inplace=True)
df.drop(df[df['index'].isnull()].index, inplace=True)
How I checked the above
我是如何检查上述内容的
Replicated the table in the original question using
df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])
使用复制原始问题中的表格
df=pd.DataFrame(data=['Banana', 'Apple'], index=[np.nan, 'OMG'],columns=['Special_name'])
then input the above two code lines- which I try to explain in human language below:
然后输入上面两行代码——我试着用下面的人类语言来解释:
- 1st line resets the index to integers, and the NaN is now in a column named after the original name of the index ('index' in the example above as there was no name specified) - pandas does this automatically with the reset_index() command.
- 2nd line from innermost brackets:
df[df['index'].isnull()]
filters rows for which column named 'index' shows 'NaN' values using isnull() command..index
is used to pass an unambiguous index object pointing to all 'index'=NaN rows to thedf.drop(
in the outermost part of the expression.
- 第一行将索引重置为整数,现在 NaN 位于以索引的原始名称命名的列中(在上面的示例中为“index”,因为没有指定名称)-pandas 使用 reset_index() 命令自动执行此操作.
- 最里面方括号的第 2 行:
df[df['index'].isnull()]
使用 isnull() 命令过滤名为“index”的列显示“NaN”值的行。.index
用于将指向所有 'index'=NaN 行的明确索引对象传递df.drop(
到表达式的最外层。
nb: tested the above command to work on multiple NaN values in a column
nb:测试了上述命令以处理列中的多个 NaN 值
Using Python 3.5.1 , Pandas 0.17.1 via Anaconda package 32bits
使用 Python 3.5.1 , Pandas 0.17.1 通过 Anaconda 包 32bits
回答by Joakim
None of the answers worked 100% for me. Here's what worked:
没有一个答案对我来说是 100% 的。这是有效的:
In [26]: print df
Out[26]:
site_id special_name
0 OMG Apple
1 NaN Banana
2 RLY Orange
In [27]: df.dropna(inplace=True)
Out[27]:
site_id special_name
0 OMG Apple
2 RLY Orange
In [28]: df.reset_index(inplace=True)
Out[28]:
index site_id special_name
0 0 OMG Apple
1 2 RLY Orange
In [29]: df.drop('index', axis='columns', inplace=True)
Out[29]:
site_id special_name
0 OMG Apple
1 RLY Orange
回答by Pietro Battiston
As of pandas
0.19, Index
es do have a .notnull()
method, so the answer by timdielscan be simplified to:
从pandas
0.19 开始,Index
es 确实有.notnull()
方法,因此可以将timdiels的答案简化为:
df[df.index.notnull()]
which I think is (currently) the simplest you can get.
我认为这是(目前)你能得到的最简单的。