以字符串元组作为索引的 Pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40186361/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:16:00  来源:igfitidea点击:

Pandas DataFrame with tuple of strings as index

pythonpandasindexing

提问by lanery

I'm sensing some weird pandasbehavior here. I have a dataframe that looks like

我在pandas这里感觉到一些奇怪的行为。我有一个看起来像的数据框

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=[('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')])

In [14]: df
Out[14]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN   NaN   NaN
(2, b)   NaN   NaN   NaN

I can set the value of an arbitrary element

我可以设置任意元素的值

In [15]: df['Col 2'].loc[('1', 'b')] = 6

In [16]: df
Out[16]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN     6   NaN
(2, b)   NaN   NaN   NaN

But when I go to reference the element that I just set using the same syntax, I get

但是当我使用相同的语法引用我刚刚设置的元素时,我得到

In [17]: df['Col 2'].loc[('1', 'b')]
KeyError: 'the label [1] is not in the [index]'

Can someone tell me what I'm doing wrong or why this behavior occurs? Am I simply not allowed to set the index as a multi-element tuple?

有人能告诉我我做错了什么或为什么会发生这种行为吗?我是否根本不允许将索引设置为多元素元组?

Edit

编辑

Apparently, wrapping the tuple index in a list works.

显然,将元组索引包装在列表中是有效的。

In [38]: df['Col 2'].loc[[('1', 'b')]]
Out[38]:
(1, b)    6
Name: Col 2, dtype: object

Although I'm still getting some weird behavior in my actual use case so it'd be nice to know if this is not recommended usage.

虽然在我的实际用例中我仍然会遇到一些奇怪的行为,所以很高兴知道这是否不是推荐的用法。

采纳答案by Boud

Your tuple in the selection brackets is seen as a sequence containing the elements you want to retrieve. It's like you would have passed ['1', 'b']as argument. Thus the KeyError message: pandas tries to find the key '1'and obviously doesn't find it.

选择括号中的元组被视为包含要检索的元素的序列。这就像你会['1', 'b']作为参数传递。因此 KeyError 消息:pandas 试图找到密钥'1',但显然没有找到。

That's why it works when you add additional brackets, as now the argument becomes a sequence of one element - your tuple.

这就是为什么当你添加额外的括号时它会起作用,因为现在参数变成了一个元素的序列 - 你的元组。

You should avoid dealing with ambiguities around list and tuple arguments in selection. The behavior can be also different depending on the index being a simple index or a multiindex.

您应该避免在选择中处理围绕列表和元组参数的歧义。根据索引是简单索引还是多索引,行为也可能不同。

In any case, if you ask about recommendations here, the one I see is that you should try to not build simple indexes made of tuples: pandas will work better and will be more powerful to use if you actually build a multiindex instead:

无论如何,如果您在这里询问建议,我看到的是您应该尽量不要构建由元组组成的简单索引:如果您实际构建多索引,pandas 会工作得更好,并且使用起来会更强大:

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=pd.MultiIndex.from_tuples([('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')]))

df['Col 2'].loc[('1', 'b')] = 6

df['Col 2'].loc[('1', 'b')]
Out[13]: 6

df
Out[14]: 
    Col 1 Col 2 Col 3
1 a   NaN   NaN   NaN
2 a   NaN   NaN   NaN
1 b   NaN     6   NaN
2 b   NaN   NaN   NaN