带有 dict 的 Pandas groupby
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25736127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby with dict
提问by Christopher Short
Is it possible to use a dict to group on elements of a column?
是否可以使用字典对列的元素进行分组?
For example:
例如:
In [3]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
...: 'B' : np.random.randn(8)})
In [4]: df
Out[4]:
A B
0 one 0.751612
1 one 0.333008
2 two 0.395667
3 three 1.636125
4 two 0.916435
5 two 1.076679
6 one -0.992324
7 three -0.593476
In [5]: d = {'one':'Start', 'two':'Start', 'three':'End'}
In [6]: grouped = df[['A','B']].groupby(d)
This (and other variations) returns an empty groupby object. And my variations on using .applyall fail too.
这个(和其他变体)返回一个空的 groupby 对象。我对使用.applyall 的变化也失败了。
I'd like to match the values of column Ato the keys of the dictionary and put rows into the groups defined by the values. The output would look something like this:
我想将列的值A与字典的键相匹配,并将行放入由值定义的组中。输出看起来像这样:
Start:
A B
0 one 0.751612
1 one 0.333008
2 two 0.395667
4 two 0.916435
5 two 1.076679
6 one -0.992324
End:
A B
3 three 1.636125
7 three -0.593476
采纳答案by Marius
From the docs, the dict has to map from labelsto group names, so this will work if you put 'A'into the index:
从docs,字典必须从标签映射到组名,所以如果你放入'A'索引,这将起作用:
grouped2 = df.set_index('A').groupby(d)
for group_name, data in grouped2:
print group_name
print '---------'
print data
# Output:
End
---------
B
A
three -1.234795
three 0.239209
Start
---------
B
A
one -1.924156
one 0.506046
two -1.681980
two 0.605248
two -0.861364
one 0.800431
Column names and row indices are both labels, whereas before you put 'A'into the index, the elements of 'A'are values.
列名和行索引都是label,而在放入'A'索引之前, 的元素'A'是values。
If you have other info in the index that makes doing a set_index()tricky, you can just create a grouping column with map():
如果您在索引中有其他信息使操作变得set_index()棘手,您可以使用以下内容创建一个分组列map():
df['group'] = df['A'].map(d)
grouped3 = df.groupby('group')
回答by David Robinson
You can group with a dictionary, but (as with any group by operation) you need to set the index column first.
您可以使用字典进行分组,但是(与任何分组操作一样)您需要先设置索引列。
grouped = df.set_index("A").groupby(d)
list(grouped)
# [('End', B
# A
# three -1.550727
# three 1.048730
#
# [2 rows x 1 columns]), ('Start', B
# A
# one -1.552152
# one -2.018647
# two -0.968068
# two 0.449016
# two -0.374453
# one 0.116770
#
# [6 rows x 1 columns])]

