pandas 如何摆脱熊猫中的多维索引

Question

提问by jeffalstott

In Pandas, what is a good way to select sets of arbitrary rows in a multiindex?

在 Pandas 中，在多索引中选择任意行集的好方法是什么？

df = pd.DataFrame(columns=['A', 'B', 'C'])
df['A'] = ['a', 'a', 'b', 'b']
df['B'] = [1,2,3,4]
df['C'] = [1,2,3,4]

the_indices_we_want = df.ix[[0,3],['A','B']]
df = df.set_index(['A', 'B']) #Create a multiindex

df.ix[the_indices_we_want] #ValueError: Cannot index with multidimensional key

df.ix[[tuple(x) for x in the_indices_we_want.values]]

This last line is an answer, but it feels clunky answer; they can't even be lists, they have to be tuples. It also involves generating a new object to do the indexing with. I'm in a situation where I'm trying to do a lookup on a multiindex dataframe, with indices from another dataframe:

这最后一行是一个答案，但感觉答案很笨拙；它们甚至不能是列表，它们必须是元组。它还涉及生成一个新对象来进行索引。我正处于尝试使用来自另一个数据帧的索引对多索引数据帧进行查找的情况：

data_we_want = dataframe_with_the_data.ix[dataframe_with_the_indices[['Index1','Index2']]]

Right now it looks like I need to write it like this:

现在看起来我需要这样写：

data_we_want = dataframe_with_the_data.ix[[tuple(x) for x in dataframe_with_the_indices[['Index1','Index2']].values]]

That is workable, but if there are many rows (i.e. hundreds of millions of desired indices) then generating this list of tuples becomes quite the burden. Any solutions?

这是可行的，但如果有很多行（即数亿个所需的索引），那么生成这个元组列表就变成了相当大的负担。任何解决方案？

Edit: The solution by @joris works, but not if the indices are all numbers. Example where the indices are all integers:

编辑：@joris 的解决方案有效，但如果索引都是数字则无效。索引都是整数的示例：

df = pd.DataFrame(columns=['A', 'B', 'C'])
df['A'] = ['a', 'a', 'b', 'b']
df['B'] = [1,2,3,4]
df['C'] = [1,2,3,4]

the_indices_we_want = df.ix[[0,3],['B','C']]
df = df.set_index(['B', 'C'])

df.ix[pd.Index(the_indices_we_want)] #ValueError: Cannot index with multidimensional key

df.ix[pd.Index(the_indices_we_want.astype('object'))] #Works, though feels clunky.

Answer 1

采纳答案by joris

You indeed cannot index with a DataFrame directly, but if you convert it to an Index object, it does the correct thing (a row in the DataFrame will be regarded as one multi-index entry):

您确实无法直接使用 DataFrame 进行索引，但是如果将其转换为 Index 对象，它会做正确的事情（DataFrame 中的一行将被视为一个多索引条目）：

In [43]: pd.Index(the_indices_we_want)
Out[43]: Index([(u'a', 1), (u'b', 4)], dtype='object')

In [44]: df.ix[pd.Index(the_indices_we_want)]
Out[44]:
     C
A B
a 1  1
b 4  4

In [45]: df.ix[[tuple(x) for x in the_indices_we_want.values]]
Out[45]:
     C
A B
a 1  1
b 4  4

This is a somewhat cleaner. And with some quick tests it seems to be a bit faster (but not much, only 2 times)

这有点清洁。通过一些快速测试，它似乎快了一点（但不多，只有 2 倍）

Answer 2

回答by bjonen

In newer versions of pandas you can simply use .iloc for row indexing.

在较新版本的 Pandas 中，您可以简单地使用 .iloc 进行行索引。

df = pd.DataFrame(columns=['A', 'B', 'C'])
df['A'] = ['a', 'a', 'b', 'b']
df['B'] = [1,2,3,4]
df['C'] = [1,2,3,4]
df.iloc[[0, 3]][['A', 'B']]

pandas 如何摆脱熊猫中的多维索引

提问by jeffalstott

采纳答案by joris

回答by bjonen

相关推荐

最近更新

标签

pandas 如何摆脱熊猫中的多维索引

提问by jeffalstott

采纳答案by joris

回答by bjonen

相关推荐

使用 Pandas 将唯一数字转换为 md5 哈希

pandas query() 方法中的错误？

Pandas read_csv 混合类型列作为字符串

pandas 如何防止在熊猫的箱线图中绘制异常值

相关推荐

最近更新

标签