Python DataFrame.loc 的“索引器太多”

Question

提问by LondonRob

I've read the docs about slicersa million times, but have never got my head round it, so I'm still trying to figure out how to use locto slice a DataFramewith a MultiIndex.

我读过的文档约切片机一百万次，但从来没有得到我的头一轮，所以我仍然试图找出如何使用loc切片一个DataFrame具有MultiIndex。

I'll start with the DataFramefrom this SO answer:

我DataFrame将从this SO answer开始：

                           value
first second third fourth       
A0    B0     C1    D0          2
                   D1          3
             C2    D0          6
                   D1          7
      B1     C1    D0         10
                   D1         11
             C2    D0         14
                   D1         15
A1    B0     C1    D0         18
                   D1         19
             C2    D0         22
                   D1         23
      B1     C1    D0         26
                   D1         27
             C2    D0         30
                   D1         31
A2    B0     C1    D0         34
                   D1         35
             C2    D0         38
                   D1         39
      B1     C1    D0         42
                   D1         43
             C2    D0         46
                   D1         47
A3    B0     C1    D0         50
                   D1         51
             C2    D0         54
                   D1         55
      B1     C1    D0         58
                   D1         59
             C2    D0         62
                   D1         63

To select just A0and C1values, I can do:

要仅选择A0和C1值，我可以执行以下操作：

In [26]: df.loc['A0', :, 'C1', :]
Out[26]: 
                           value
first second third fourth       
A0    B0     C1    D0          2
                   D1          3
      B1     C1    D0         10
                   D1         11

Which also works selecting from three levels, and even with tuples:

这也适用于从三个级别中进行选择，甚至可以使用元组：

In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1']
Out[28]: 
                           value
first second third fourth       
A0    B0     C1    D1          3
             C2    D1          5
      B1     C1    D1         11
             C2    D1         13

So far, intuitive and brilliant.

到目前为止，直观而精彩。

So why can't I select all values from the first index level?

那么为什么我不能从第一个索引级别中选择所有值呢？

In [30]: df.loc[:, :, 'C1', :]
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
<ipython-input-30-57b56108d941> in <module>()
----> 1 df.loc[:, :, 'C1', :]

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)
   1176     def __getitem__(self, key):
   1177         if type(key) is tuple:
-> 1178             return self._getitem_tuple(key)
   1179         else:
   1180             return self._getitem_axis(key, axis=0)

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)
    694 
    695         # no multi-index, so validate all of the indexers
--> 696         self._has_valid_tuple(tup)
    697 
    698         # ugly hack for GH #836

/usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key)
    125         for i, k in enumerate(key):
    126             if i >= self.obj.ndim:
--> 127                 raise IndexingError('Too many indexers')
    128             if not self._has_valid_type(k, i):
    129                 raise ValueError("Location based indexing can only have [%s] "

IndexingError: Too many indexers

Surely this is not intended behaviour?

这肯定不是有意的行为吗？

Note: I know this is possible with df.xs('C1', level='third')but the current .locbehaviour seems inconsistent.

注意：我知道这是可能的，df.xs('C1', level='third')但当前的.loc行为似乎不一致。

Answer 1

采纳答案by djakubosky

The reason this doesn't work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this:

这不起作用的原因与需要指定索引轴有关（在http://pandas.pydata.org/pandas-docs/stable/advanced.html 中提到）。您的问题的另一种解决方案是简单地执行以下操作：

df.loc(axis=0)[:, :, 'C1', :]

Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named 'C1' or something you would also need to do this under this style of slicing/selecting.

当索引相似或包含相似的值时，熊猫有时会感到困惑。如果您有一个名为“C1”的列或其他名称，您还需要在这种切片/选择样式下执行此操作。

Answer 2

回答by joris

To be safe (in the sense: this will work in all cases), you need to index both row index and columns, for which you can use pd.IndexSliceto do this easily:

为了安全起见（从某种意义上说：这将适用于所有情况），您需要索引行索引和列，您可以使用pd.IndexSlice它们轻松地做到这一点：

In [26]: idx = pd.IndexSlice

In [27]: df.loc[idx[:, :, 'C1', :],:]
Out[27]:
                           value
first second third fourth
A0    B0     C1    D0          2
                   D1          3
      B1     C1    D0         10
                   D1         11
A1    B0     C1    D0         18
                   D1         19
      B1     C1    D0         26
                   D1         27
A2    B0     C1    D0         34
                   D1         35
      B1     C1    D0         42
                   D1         43
A3    B0     C1    D0         50
                   D1         51
      B1     C1    D0         58
                   D1         59

Here idx[:, :, 'C1', :]is an easier way to write [slice(None), slice(None),'C1', slice(None)]. Instead of pd.IndexSlice, you can also use np.s_which is a bit shorter.

这idx[:, :, 'C1', :]是一种更简单的编写方法[slice(None), slice(None),'C1', slice(None)]。代替pd.IndexSlice，您还可以使用np.s_更短的。

The reason that the other ones work, I am not fully sure of. But see the note in the documentation here: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers(the first red warning box) where it is stated that:

其他的工作的原因，我不完全确定。但请参阅此处文档中的注释：http: //pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers（第一个红色警告框），其中指出：

You should specify all axes in the .locspecifier, meaning the indexer for the index and for the columns. Their are some ambiguous cases where the passed indexer could be mis-interpreted as indexing bothaxes, rather than into say the MuliIndex for the rows.

您应该在说明.loc符中指定所有轴，即索引和列的索引器。它们是一些模棱两可的情况，其中传递的索引器可能被误解为索引两个轴，而不是说行的 MuliIndex。

Python DataFrame.loc 的“索引器太多”

提问by LondonRob

采纳答案by djakubosky

回答by joris

相关推荐

最近更新

标签

Python DataFrame.loc 的“索引器太多”

提问by LondonRob

采纳答案by djakubosky

回答by joris

相关推荐

Python 如何解决熊猫导入错误？

Python Numpy 数组：序列太大

Python 将 .data 文件转换为 .csv

如何使用 Python 中的日志记录打印列表项 + 整数/字符串

相关推荐

最近更新

标签