以字符串元组作为索引的 Pandas DataFrame

Question

提问by lanery

I'm sensing some weird pandasbehavior here. I have a dataframe that looks like

我在pandas这里感觉到一些奇怪的行为。我有一个看起来像的数据框

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=[('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')])

In [14]: df
Out[14]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN   NaN   NaN
(2, b)   NaN   NaN   NaN

I can set the value of an arbitrary element

我可以设置任意元素的值

In [15]: df['Col 2'].loc[('1', 'b')] = 6

In [16]: df
Out[16]:
       Col 1 Col 2 Col 3
(1, a)   NaN   NaN   NaN
(2, a)   NaN   NaN   NaN
(1, b)   NaN     6   NaN
(2, b)   NaN   NaN   NaN

But when I go to reference the element that I just set using the same syntax, I get

但是当我使用相同的语法引用我刚刚设置的元素时，我得到

In [17]: df['Col 2'].loc[('1', 'b')]
KeyError: 'the label [1] is not in the [index]'

Can someone tell me what I'm doing wrong or why this behavior occurs? Am I simply not allowed to set the index as a multi-element tuple?

有人能告诉我我做错了什么或为什么会发生这种行为吗？我是否根本不允许将索引设置为多元素元组？

Edit

编辑

Apparently, wrapping the tuple index in a list works.

显然，将元组索引包装在列表中是有效的。

In [38]: df['Col 2'].loc[[('1', 'b')]]
Out[38]:
(1, b)    6
Name: Col 2, dtype: object

Although I'm still getting some weird behavior in my actual use case so it'd be nice to know if this is not recommended usage.

虽然在我的实际用例中我仍然会遇到一些奇怪的行为，所以很高兴知道这是否不是推荐的用法。

Answer 1

采纳答案by Boud

Your tuple in the selection brackets is seen as a sequence containing the elements you want to retrieve. It's like you would have passed ['1', 'b']as argument. Thus the KeyError message: pandas tries to find the key '1'and obviously doesn't find it.

选择括号中的元组被视为包含要检索的元素的序列。这就像你会['1', 'b']作为参数传递。因此 KeyError 消息：pandas 试图找到密钥'1'，但显然没有找到。

That's why it works when you add additional brackets, as now the argument becomes a sequence of one element - your tuple.

这就是为什么当你添加额外的括号时它会起作用，因为现在参数变成了一个元素的序列 - 你的元组。

You should avoid dealing with ambiguities around list and tuple arguments in selection. The behavior can be also different depending on the index being a simple index or a multiindex.

您应该避免在选择中处理围绕列表和元组参数的歧义。根据索引是简单索引还是多索引，行为也可能不同。

In any case, if you ask about recommendations here, the one I see is that you should try to not build simple indexes made of tuples: pandas will work better and will be more powerful to use if you actually build a multiindex instead:

无论如何，如果您在这里询问建议，我看到的是您应该尽量不要构建由元组组成的简单索引：如果您实际构建多索引，pandas 会工作得更好，并且使用起来会更强大：

df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'],
                  index=pd.MultiIndex.from_tuples([('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')]))

df['Col 2'].loc[('1', 'b')] = 6

df['Col 2'].loc[('1', 'b')]
Out[13]: 6

df
Out[14]: 
    Col 1 Col 2 Col 3
1 a   NaN   NaN   NaN
2 a   NaN   NaN   NaN
1 b   NaN     6   NaN
2 b   NaN   NaN   NaN

以字符串元组作为索引的 Pandas DataFrame

提问by lanery

采纳答案by Boud

相关推荐

最近更新

标签

以字符串元组作为索引的 Pandas DataFrame

提问by lanery

采纳答案by Boud

相关推荐

pandas read_sql 异常缓慢

pandas 熊猫：时间戳到日期时间

Pandas：查询字符串，其中列名包含特殊字符

pandas 使用函数在pandas df中添加一列

相关推荐

最近更新

标签