Python Pandas: IndexingError: Unalignable boolean Series 作为索引器提供

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45352909/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:57:13  来源:igfitidea点击:

Pandas: IndexingError: Unalignable boolean Series provided as indexer

pythonpandas

提问by elPastor

I'm trying to run what I think is simple code to eliminate any columns with all NaNs, but can't get this to work (axis = 1works just fine when eliminating rows):

我正在尝试运行我认为是简单的代码来消除包含所有 NaN 的任何列,但无法使其正常工作(axis = 1消除行时工作正常):

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})

df = df[df.notnull().any(axis = 0)]

print df

Full error:

完整错误:

raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Expected output:

预期输出:

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

回答by jezrael

You need loc, because filter by columns:

您需要loc,因为按列过滤:

print (df.notnull().any(axis = 0))
a     True
b     True
c     True
d    False
dtype: bool

df = df.loc[:, df.notnull().any(axis = 0)]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

Or filter columns and then select by []:

或过滤列,然后选择[]

print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')

df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

Or dropnawith parameter how='all'for remove all columns filled by NaNs only:

或者dropna使用参数how='all'删除NaN仅由s填充的所有列:

print (df.dropna(axis=1, how='all'))
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

回答by EdChum

You can use dropnawith axis=1and thresh=1:

您可以dropnaaxis=1和一起使用thresh=1

In[19]:
df.dropna(axis=1, thresh=1)

Out[19]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

This will drop any column which doesn't have at least 1 non-NaN value which will mean any column with all NaNwill get dropped

这将删除没有至少 1 个非 NaN 值的任何列,这意味着所有列都NaN将被删除

The reason what you tried failed is because the boolean mask:

您尝试失败的原因是因为布尔掩码:

In[20]:
df.notnull().any(axis = 0)

Out[20]: 
a     True
b     True
c     True
d    False
dtype: bool

cannot be aligned on the index which is what is used by default, as this produces a boolean mask on the columns

不能在默认使用的索引上对齐,因为这会在列上产生一个布尔掩码