pandas 使用布尔索引的 IndexingError
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27112755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
IndexingError using Boolean Indexing
提问by mgilbert
I am trying to index a dataframe using a boolean Series similar to here
我正在尝试使用类似于此处的布尔系列来索引数据帧
In [1]: import pandas as pd
In [2]: idx = pd.Index(["USD.CAD", "AUD.NZD", "EUR.USD", "GBP.USD"],
...: name="Currency Pair")
In [3]: pairs = pd.DataFrame({"mean":[3.6,5.1,3.6,2.7], "count":[1,5,8,2]}, index=idx)
In [4]: mask = pairs.reset_index().loc[:,"Currency Pair"].str.contains("USD")
In [5]: pairs.reset_index()[mask]
Out[5]:
Currency Pair count mean
0 USD.CAD 1 3.6
2 EUR.USD 8 3.6
3 GBP.USD 2 2.7
The above works as expected however when I try with the original dataframe without the index reset I get the following error
以上按预期工作,但是当我尝试使用没有索引重置的原始数据框时,我收到以下错误
In [6]: pairs[mask]
C:\Anaconda\lib\site-packages\pandas\core\frame.py:1808: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
"DataFrame index.", UserWarning)
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
<ipython-input-6-9eca5ffbdaf7> in <module>()
----> 1 pairs[mask]
C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1772 if isinstance(key, (Series, np.ndarray, Index, list)):
1773 # either boolean or fancy integer index
-> 1774 return self._getitem_array(key)
1775 elif isinstance(key, DataFrame):
1776 return self._getitem_frame(key)
C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _getitem_array(self, key)
1812 # _check_bool_indexer will throw exception if Series key cannot
1813 # be reindexed to match DataFrame rows
-> 1814 key = _check_bool_indexer(self.index, key)
1815 indexer = key.nonzero()[0]
1816 return self.take(indexer, axis=0, convert=False)
C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _check_bool_indexer(ax, key)
1637 mask = com.isnull(result.values)
1638 if mask.any():
-> 1639 raise IndexingError('Unalignable boolean Series key provided')
1640
1641 result = result.astype(bool).values
IndexingError: Unalignable boolean Series key provided
I am confused by this error since my impression was this was an error received if the boolean index length differed from that of the dataframe? Which is not the case as can be seen below.
我对这个错误感到困惑,因为我的印象是如果布尔索引长度与数据帧的长度不同,这是一个错误?如下所示,情况并非如此。
In [7]: len(mask)
Out[7]: 4
In [8]: len(pairs)
Out[8]: 4
In [9]: len(pairs.reset_index())
Out[9]: 4
回答by mgilbert
I figured I would put down the solution @EdChum indicated in the comments. The issue as he indicated was that the mask.index does not agree with pairs.index. Replacing the index of mask with the index from pairs we get the expected behaviour.
我想我会放下评论中指出的解决方案@EdChum。他指出的问题是 mask.index 与pairs.index 不一致。用来自成对的索引替换掩码的索引,我们得到预期的行为。
In[10]: mask.index = pairs.index.copy()
In[11]: pairs[mask]
Out[11]:
count mean
Currency Pair
USD.CAD 1 3.6
EUR.USD 8 3.6
GBP.USD 2 2.7
回答by Rob
You could use a mask generated from the index directly.
您可以直接使用从索引生成的掩码。
In [22]: mask = pairs.index.str.contains("USD")
In [23]: pairs[mask]
Out[23]:
count mean
Currency Pair
USD.CAD 1 3.6
EUR.USD 8 3.6
GBP.USD 2 2.7
No need to reindex anything.
无需重新索引任何内容。

