Python pandas.core.indexing.IndexingError:提供了不可对齐的布尔系列键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46374860/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:31:01  来源:igfitidea点击:

Python pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

pythonpandas

提问by alwaysaskingquestions

So I read in a data table with 29 columns and i added in one index column (so 30 in total).

所以我读入了一个包含 29 列的数据表,并添加了一个索引列(总共 30 个)。

Data = pd.read_excel(os.path.join(BaseDir, 'test.xlsx'))
Data.reset_index(inplace=True)

and then, i wanted to subset the data to only include the columns whose column name contains "ref" or "Ref"; I got below code from another Stack post:

然后,我想对数据进行子集化以仅包含列名包含“ref”或“Ref”的列;我从另一个 Stack 帖子中得到以下代码:

col_keep = Data.ix[:, pd.Series(Data.columns.values).str.contains('ref', case=False)]

However, I keep getting this error:

但是,我不断收到此错误:

    print(len(Data.columns.values))
    30
    print(pd.Series(Data.columns.values).str.contains('ref', case=False))
    0     False
    1     False
    2     False
    3     False
    4     False
    5     False
    6     False
    7     False
    8     False
    9     False
    10    False
    11    False
    12    False
    13    False
    14    False
    15    False
    16    False
    17    False
    18    False
    19    False
    20    False
    21    False
    22    False
    23    False
    24     True
    25     True
    26     True
    27     True
    28    False
    29    False
    dtype: bool

Traceback (most recent call last):
  File "C:/Users/lala.py", line 26, in <module>
    col_keep = FedexData.ix[:, pd.Series(FedexData.columns.values).str.contains('ref', case=False)]
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 84, in __getitem__
    return self._getitem_tuple(key)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 816, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1014, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1041, in _getitem_iterable
    key = check_bool_indexer(labels, key)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexing.py", line 1817, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series key provided')
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

So the boolean values are correct, but why is it not working? why is the error keep popping up?

所以布尔值是正确的,但为什么它不起作用?为什么错误不断弹出?

Any help/hint is appreciated! Thank you so so much.

任何帮助/提示表示赞赏!非常感谢你。

回答by unutbu

I can reproduce a similar error message this way:

我可以通过这种方式重现类似的错误消息:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(4, size=(10,4)), columns=list('ABCD'))
df.ix[:, pd.Series([True,False,True,False])]

raises (using Pandas version 0.21.0.dev+25.g50e95e0)

提高(使用 Pandas 版本 0.21.0.dev+25.g50e95e0)

pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

The problem occurs because Pandas is trying to align the index of the Series with the column index of the DataFrame before masking with the Series boolean values. Since dfhas column labels 'A', 'B', 'C', 'D'and the Series has index labels 0, 1, 2, 3, Pandas is complaining that the labels are unalignable.

出现问题是因为 Pandas 试图在用 Series 布尔值屏蔽之前将 Series 的索引与 DataFrame 的列索引对齐。由于df具有列标签'A', 'B', 'C', 'D'并且系列具有索引标签0, 1, 2, 3,Pandas 抱怨标签无法对齐。

You probably don't want any index alignment. So instead, pass a NumPy boolean array instead of a Pandas Series:

您可能不想要任何索引对齐。因此,相反,传递一个 NumPy 布尔数组而不是 Pandas 系列:

mask = pd.Series(Data.columns.values).str.contains('ref', case=False).values
col_keep = Data.loc[:, mask]

The Series.valuesattribute returns a NumPy array. And since in future versions of Pandas, DataFrame.ixwill be removed, use Data.locinstead of Data.ixhere since we want boolean indexing.

Series.values属性返回一个 NumPy 数组。并且因为在 Pandas 的未来版本中,DataFrame.ix将被删除,使用Data.loc而不是Data.ix这里,因为我们想要布尔索引。