pandas 在熊猫数据框中排除索引行的最有效方法

Question

提问by dkapitan

I'm relatively new to Python & pandas and am struggling with (hierachical) indexes. I've got the basics covered, but am lost with more advanced slicing and cross-sectioning.

我对 Python 和 Pandas 比较陌生，并且正在努力处理（分层）索引。我已经涵盖了基础知识，但在更高级的切片和横截面中迷失了。

For example, with the following dataframe

例如，使用以下数据框

import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(9).reshape((3, 3)),
    index=pd.Index(['Ohio', 'Colorado', 'New York'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))

I want to select everything except the row with index 'Colorado'. For a small dataset I could do:

我想选择除索引为“科罗拉多”的行之外的所有内容。对于一个小数据集，我可以这样做：

data.ix[['Ohio','New York']]

But if the number of unique index values is large, that's impractical. Naively, I would expect a syntax like

但是如果唯一索引值的数量很大，这是不切实际的。天真地，我希望像这样的语法

data.ix[['state' != 'Colorado']]

However, this only returns the first record 'Ohio' and doesn't return 'New York'. This works, but is cumbersome

但是，这只会返回第一条记录“Ohio”，而不会返回“New York”。这有效，但很麻烦

filter = list(set(data.index.get_level_values(0).unique()) - set(['Colorado']))
data[filter]

Surely there's a more Pythonic, verbose way of doing this?

肯定有一种更 Pythonic 的、冗长的方式来做到这一点？

Answer 1

回答by DSM

This is a Python issue, not a pandasone: 'state' != 'Colorado'is True, so what pandasgets is data.ix[[True]].

这是一个 Python 问题，而不是pandas一个：'state' != 'Colorado'是真的，所以pandas得到的是data.ix[[True]].

You could do

你可以做

>>> data.loc[data.index != "Colorado"]
number    one  two  three
state                    
Ohio        0    1      2
New York    6    7      8

[2 rows x 3 columns]

or use DataFrame.query:

或使用DataFrame.query：

>>> data.query("state != 'New York'")
number    one  two  three
state                    
Ohio        0    1      2
Colorado    3    4      5

[2 rows x 3 columns]

if you don't like the duplication of data. (Quoting the expression passed to the .query()method is one of the only ways around the fact that otherwise Python would evaluate the comparison before pandasever saw it.)

如果您不喜欢data. （引用传递给该.query()方法的表达式是绕过这一事实的唯一方法之一，否则 Python 会在pandas看到它之前评估比较。）

Answer 2

回答by Alexander McFarlane

This is a robust solution that will also work with MultiIndex objects

这是一个强大的解决方案，也适用于 MultiIndex 对象

Single Index

单一索引

excluded = ['Ohio']
indices = data.index.get_level_values('state').difference(excluded)
indx = pd.IndexSlice[indices.values]

The output

输出

In [77]: data.loc[indx]
Out[77]:
number    one  two  three
state
Colorado    3    4      5
New York    6    7      8

MultiIndex Extension

多索引扩展

Here I extend to a MultiIndex example...

在这里，我扩展到 MultiIndex 示例...

data = pd.DataFrame(np.arange(18).reshape(6,3), index=pd.MultiIndex(levels=[[u'AU', u'UK'], [u'Derby', u'Kensington', u'Newcastle', u'Sydney']], labels=[[0, 0, 0, 1, 1, 1], [0, 2, 3, 0, 1, 2]], names=[u'country', u'town']), columns=pd.Index(['one', 'two', 'three'], name='number'))

Assume we want to exclude 'Newcastle'from both examples in this new MultiIndex

假设我们'Newcastle'要从这个新的 MultiIndex 中的两个示例中排除

excluded = ['Newcastle']
indices = data.index.get_level_values('town').difference(excluded)
indx = pd.IndexSlice[:, indices.values]

Which gives the expected result

这给出了预期的结果

In [115]: data.loc[indx, :]
Out[115]:
number              one  two  three
country town
AU      Derby         0    1      2
        Sydney        3    4      5
UK      Derby         0    1      2
        Kensington    3    4      5

Common Pitfalls

常见的陷阱

Make sure that all levels of your index are sorted, you require data.sort_index(inplace=True)
Make sure you include the null slice for columns data.loc[indx, :]
Sometimes indx = pd.IndexSlice[:, indices]is enough but I found that often I needed to use indx = pd.IndexSlice[:, indices.values]

确保索引的所有级别都已排序，您需要 data.sort_index(inplace=True)
确保包含列的空切片 data.loc[indx, :]
有时indx = pd.IndexSlice[:, indices]就足够了，但我发现我经常需要使用indx = pd.IndexSlice[:, indices.values]

pandas 在熊猫数据框中排除索引行的最有效方法

提问by dkapitan

回答by DSM

回答by Alexander McFarlane

Single Index

单一索引

MultiIndex Extension

多索引扩展

Common Pitfalls

常见的陷阱

相关推荐

最近更新

标签

pandas 在熊猫数据框中排除索引行的最有效方法

提问by dkapitan

回答by DSM

回答by Alexander McFarlane

Single Index

单一索引

MultiIndex Extension

多索引扩展

Common Pitfalls

常见的陷阱

相关推荐

在循环中替换 Pandas 数据框中的值

pandas 选择pandas groupby数据帧的子集，其中多个键具有值

pandas 如何在python中修剪一系列字符串对象？

复制 Pandas DF N 次

相关推荐

最近更新

标签