pandas 为什么pandas groupby().transform() 需要唯一索引?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16311793/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does pandas groupby().transform() require a unique index?
提问by patricksurry
I want to use groupby().transform() to do a custom (cumulative) transform of each block of records in a (sorted) dataset. Unless I ensure I have a unique key, it doesn't work. Why?
我想使用 groupby().transform() 对(排序的)数据集中的每个记录块进行自定义(累积)转换。除非我确保我有一个唯一的密钥,否则它不起作用。为什么?
Here's a toy example:
这是一个玩具示例:
df = pd.DataFrame([[1,1],
[1,2],
[2,3],
[3,4],
[3,5]],
columns='a b'.split())
df['partials'] = df.groupby('a')['b'].transform(np.cumsum)
df
gives the expected:
给出预期:
a b partials
0 1 1 1
1 1 2 3
2 2 3 3
3 3 4 4
4 3 5 9
but if 'a' is a key, it all goes wrong:
但如果 'a' 是一个键,那么一切都会出错:
df = df.set_index('a')
df['partials'] = df.groupby(level=0)['b'].transform(np.cumsum)
df
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-146-d0c35a4ba053> in <module>()
3
4 df = df.set_index('a')
----> 5 df.groupby(level=0)['b'].transform(np.cumsum)
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/groupby.pyc in transform(self, func, *args, **kwargs)
1542 res = wrapper(group)
1543 # result[group.index] = res
-> 1544 indexer = self.obj.index.get_indexer(group.index)
1545 np.put(result, indexer, res)
1546
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.pyc in get_indexer(self, target, method, limit)
847
848 if not self.is_unique:
--> 849 raise Exception('Reindexing only valid with uniquely valued Index '
850 'objects')
851
Exception: Reindexing only valid with uniquely valued Index objects
Same error if you select column 'b' before grouping, ie.
如果您在分组之前选择列 'b',则会出现相同的错误,即。
df['b'].groupby(level=0).transform(np.cumsum)
but you can make it work if you transform the entire dataframe, like:
但是如果你转换整个数据框,你就可以让它工作,比如:
df.groupby(level=0).transform(np.cumsum)
or even a one-column dataframe (rather than series):
甚至是一列数据框(而不是系列):
df.groupby(level=0)[['b']].transform(np.cumsum)
I feel like there's some still some deep part of GroupBy-futhat I'm missing. Can someone set me straight?
我觉得GroupBy-fu中仍有一些我遗漏的深层部分。有人可以让我直截了当吗?
采纳答案by Andy Hayden
This was a bug, since fixed in pandas (certainly in 0.15.2, IIRC it was fixed in 0.14), so you should no longer see this exception.
这是一个错误,因为已在 Pandas 中修复(当然在 0.15.2 中,IIRC 已在 0.14 中修复),因此您不应再看到此异常。
As a workaround, in earlier pandas you can use apply:
作为一种解决方法,在早期的 Pandas 中,您可以使用apply:
In [10]: g = df.groupby(level=0)['b']
In [11]: g.apply(np.cumsum)
Out[11]:
a
1 1
1 3
2 3
3 4
3 9
dtype: int64
and you can assign this to a column in df
你可以将它分配给 df 中的一列
In [12]: df['partial'] = g.apply(np.cumsum)

