Python DataFrame.loc 的“索引器太多”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30781037/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"Too many indexers" with DataFrame.loc
提问by LondonRob
I've read the docs about slicersa million times, but have never got my head round it, so I'm still trying to figure out how to use loc
to slice a DataFrame
with a MultiIndex
.
我读过的文档约切片机一百万次,但从来没有得到我的头一轮,所以我仍然试图找出如何使用loc
切片一个DataFrame
具有MultiIndex
。
I'll start with the DataFrame
from this SO answer:
我DataFrame
将从this SO answer开始:
value
first second third fourth
A0 B0 C1 D0 2
D1 3
C2 D0 6
D1 7
B1 C1 D0 10
D1 11
C2 D0 14
D1 15
A1 B0 C1 D0 18
D1 19
C2 D0 22
D1 23
B1 C1 D0 26
D1 27
C2 D0 30
D1 31
A2 B0 C1 D0 34
D1 35
C2 D0 38
D1 39
B1 C1 D0 42
D1 43
C2 D0 46
D1 47
A3 B0 C1 D0 50
D1 51
C2 D0 54
D1 55
B1 C1 D0 58
D1 59
C2 D0 62
D1 63
To select just A0
and C1
values, I can do:
要仅选择A0
和C1
值,我可以执行以下操作:
In [26]: df.loc['A0', :, 'C1', :]
Out[26]:
value
first second third fourth
A0 B0 C1 D0 2
D1 3
B1 C1 D0 10
D1 11
Which also works selecting from three levels, and even with tuples:
这也适用于从三个级别中进行选择,甚至可以使用元组:
In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1']
Out[28]:
value
first second third fourth
A0 B0 C1 D1 3
C2 D1 5
B1 C1 D1 11
C2 D1 13
So far, intuitive and brilliant.
到目前为止,直观而精彩。
So why can't I select all values from the first index level?
那么为什么我不能从第一个索引级别中选择所有值呢?
In [30]: df.loc[:, :, 'C1', :]
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
<ipython-input-30-57b56108d941> in <module>()
----> 1 df.loc[:, :, 'C1', :]
/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)
1176 def __getitem__(self, key):
1177 if type(key) is tuple:
-> 1178 return self._getitem_tuple(key)
1179 else:
1180 return self._getitem_axis(key, axis=0)
/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
694
695 # no multi-index, so validate all of the indexers
--> 696 self._has_valid_tuple(tup)
697
698 # ugly hack for GH #836
/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key)
125 for i, k in enumerate(key):
126 if i >= self.obj.ndim:
--> 127 raise IndexingError('Too many indexers')
128 if not self._has_valid_type(k, i):
129 raise ValueError("Location based indexing can only have [%s] "
IndexingError: Too many indexers
Surely this is not intended behaviour?
这肯定不是有意的行为吗?
Note: I know this is possible with df.xs('C1', level='third')
but the current .loc
behaviour seems inconsistent.
注意:我知道这是可能的,df.xs('C1', level='third')
但当前的.loc
行为似乎不一致。
采纳答案by djakubosky
The reason this doesn't work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this:
这不起作用的原因与需要指定索引轴有关(在http://pandas.pydata.org/pandas-docs/stable/advanced.html 中提到)。您的问题的另一种解决方案是简单地执行以下操作:
df.loc(axis=0)[:, :, 'C1', :]
Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named 'C1' or something you would also need to do this under this style of slicing/selecting.
当索引相似或包含相似的值时,熊猫有时会感到困惑。如果您有一个名为“C1”的列或其他名称,您还需要在这种切片/选择样式下执行此操作。
回答by joris
To be safe (in the sense: this will work in all cases), you need to index both row index and columns, for which you can use pd.IndexSlice
to do this easily:
为了安全起见(从某种意义上说:这将适用于所有情况),您需要索引行索引和列,您可以使用pd.IndexSlice
它们轻松地做到这一点:
In [26]: idx = pd.IndexSlice
In [27]: df.loc[idx[:, :, 'C1', :],:]
Out[27]:
value
first second third fourth
A0 B0 C1 D0 2
D1 3
B1 C1 D0 10
D1 11
A1 B0 C1 D0 18
D1 19
B1 C1 D0 26
D1 27
A2 B0 C1 D0 34
D1 35
B1 C1 D0 42
D1 43
A3 B0 C1 D0 50
D1 51
B1 C1 D0 58
D1 59
Here idx[:, :, 'C1', :]
is an easier way to write [slice(None), slice(None),'C1', slice(None)]
. Instead of pd.IndexSlice
, you can also use np.s_
which is a bit shorter.
这idx[:, :, 'C1', :]
是一种更简单的编写方法[slice(None), slice(None),'C1', slice(None)]
。代替pd.IndexSlice
,您还可以使用np.s_
更短的 。
The reason that the other ones work, I am not fully sure of. But see the note in the documentation here: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers(the first red warning box) where it is stated that:
其他的工作的原因,我不完全确定。但请参阅此处文档中的注释:http: //pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers(第一个红色警告框),其中指出:
You should specify all axes in the
.loc
specifier, meaning the indexer for the index and for the columns. Their are some ambiguous cases where the passed indexer could be mis-interpreted as indexing bothaxes, rather than into say the MuliIndex for the rows.
您应该在说明
.loc
符中指定所有轴,即索引和列的索引器。它们是一些模棱两可的情况,其中传递的索引器可能被误解为索引两个轴,而不是说行的 MuliIndex。