Python Pandas: IndexingError: Unalignable boolean Series 作为索引器提供
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45352909/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: IndexingError: Unalignable boolean Series provided as indexer
提问by elPastor
I'm trying to run what I think is simple code to eliminate any columns with all NaNs, but can't get this to work (axis = 1
works just fine when eliminating rows):
我正在尝试运行我认为是简单的代码来消除包含所有 NaN 的任何列,但无法使其正常工作(axis = 1
消除行时工作正常):
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})
df = df[df.notnull().any(axis = 0)]
print df
Full error:
完整错误:
raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
Expected output:
预期输出:
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
回答by jezrael
You need loc
, because filter by columns:
您需要loc
,因为按列过滤:
print (df.notnull().any(axis = 0))
a True
b True
c True
d False
dtype: bool
df = df.loc[:, df.notnull().any(axis = 0)]
print (df)
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
Or filter columns and then select by []
:
或过滤列,然后选择[]
:
print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')
df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
Or dropna
with parameter how='all'
for remove all columns filled by NaN
s only:
或者dropna
使用参数how='all'
删除NaN
仅由s填充的所有列:
print (df.dropna(axis=1, how='all'))
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
回答by EdChum
You can use dropna
with axis=1
and thresh=1
:
您可以dropna
与axis=1
和一起使用thresh=1
:
In[19]:
df.dropna(axis=1, thresh=1)
Out[19]:
a b c
0 1.0 4.0 NaN
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 NaN NaN NaN
This will drop any column which doesn't have at least 1 non-NaN value which will mean any column with all NaN
will get dropped
这将删除没有至少 1 个非 NaN 值的任何列,这意味着所有列都NaN
将被删除
The reason what you tried failed is because the boolean mask:
您尝试失败的原因是因为布尔掩码:
In[20]:
df.notnull().any(axis = 0)
Out[20]:
a True
b True
c True
d False
dtype: bool
cannot be aligned on the index which is what is used by default, as this produces a boolean mask on the columns
不能在默认使用的索引上对齐,因为这会在列上产生一个布尔掩码