pandas 为什么pandas groupby().transform() 需要唯一索引？

Question

提问by patricksurry

I want to use groupby().transform() to do a custom (cumulative) transform of each block of records in a (sorted) dataset. Unless I ensure I have a unique key, it doesn't work. Why?

我想使用 groupby().transform() 对（排序的）数据集中的每个记录块进行自定义（累积）转换。除非我确保我有一个唯一的密钥，否则它不起作用。为什么？

Here's a toy example:

这是一个玩具示例：

df = pd.DataFrame([[1,1],
                  [1,2],
                  [2,3],
                  [3,4],
                  [3,5]], 
                  columns='a b'.split())
df['partials'] = df.groupby('a')['b'].transform(np.cumsum)
df

gives the expected:

给出预期：

     a   b   partials
0    1   1   1
1    1   2   3
2    2   3   3
3    3   4   4
4    3   5   9

but if 'a' is a key, it all goes wrong:

但如果 'a' 是一个键，那么一切都会出错：

df = df.set_index('a')
df['partials'] = df.groupby(level=0)['b'].transform(np.cumsum)
df

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-146-d0c35a4ba053> in <module>()
      3 
      4 df = df.set_index('a')
----> 5 df.groupby(level=0)['b'].transform(np.cumsum)

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/groupby.pyc in transform(self, func, *args, **kwargs)
   1542             res = wrapper(group)
   1543             # result[group.index] = res
-> 1544             indexer = self.obj.index.get_indexer(group.index)
   1545             np.put(result, indexer, res)
   1546 

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.pyc in get_indexer(self, target, method, limit)
    847 
    848         if not self.is_unique:
--> 849             raise Exception('Reindexing only valid with uniquely valued Index '
    850                             'objects')
    851 

Exception: Reindexing only valid with uniquely valued Index objects

Same error if you select column 'b' before grouping, ie.

如果您在分组之前选择列 'b'，则会出现相同的错误，即。

df['b'].groupby(level=0).transform(np.cumsum)

but you can make it work if you transform the entire dataframe, like:

但是如果你转换整个数据框，你就可以让它工作，比如：

df.groupby(level=0).transform(np.cumsum)

or even a one-column dataframe (rather than series):

甚至是一列数据框（而不是系列）：

df.groupby(level=0)[['b']].transform(np.cumsum)

I feel like there's some still some deep part of GroupBy-futhat I'm missing. Can someone set me straight?

我觉得GroupBy-fu中仍有一些我遗漏的深层部分。有人可以让我直截了当吗？

Answer 1

采纳答案by Andy Hayden

This was a bug, since fixed in pandas (certainly in 0.15.2, IIRC it was fixed in 0.14), so you should no longer see this exception.

这是一个错误，因为已在 Pandas 中修复（当然在 0.15.2 中，IIRC 已在 0.14 中修复），因此您不应再看到此异常。

As a workaround, in earlier pandas you can use apply:

作为一种解决方法，在早期的 Pandas 中，您可以使用apply：

In [10]: g = df.groupby(level=0)['b']

In [11]: g.apply(np.cumsum)
Out[11]:
a
1    1
1    3
2    3
3    4
3    9
dtype: int64

and you can assign this to a column in df

你可以将它分配给 df 中的一列

In [12]: df['partial'] = g.apply(np.cumsum)

pandas 为什么pandas groupby().transform() 需要唯一索引？

提问by patricksurry

采纳答案by Andy Hayden

相关推荐

最近更新

标签

pandas 为什么pandas groupby().transform() 需要唯一索引？

提问by patricksurry

采纳答案by Andy Hayden

相关推荐

pandas 如何从python中的csv读取编码字符串的数据帧

Pandas 在 Python 合并时删除索引索引？

pandas 遍历熊猫数据框

导入excel文件错误python pandas

相关推荐

最近更新

标签