带有 dict 的 Pandas groupby

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25736127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:26:25  来源:igfitidea点击:

Pandas groupby with dict

pythonpandas

提问by Christopher Short

Is it possible to use a dict to group on elements of a column?

是否可以使用字典对列的元素进行分组?

For example:

例如:

In [3]: df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
   ...:          'B' : np.random.randn(8)})
In [4]: df
Out[4]: 
       A         B
0    one  0.751612
1    one  0.333008
2    two  0.395667
3  three  1.636125
4    two  0.916435
5    two  1.076679
6    one -0.992324
7  three -0.593476

In [5]: d = {'one':'Start', 'two':'Start', 'three':'End'}
In [6]: grouped = df[['A','B']].groupby(d)

This (and other variations) returns an empty groupby object. And my variations on using .applyall fail too.

这个(和其他变体)返回一个空的 groupby 对象。我对使用.applyall 的变化也失败了。

I'd like to match the values of column Ato the keys of the dictionary and put rows into the groups defined by the values. The output would look something like this:

我想将列的值A与字典的键相匹配,并将行放入由值定义的组中。输出看起来像这样:

 Start:
           A         B
    0    one  0.751612
    1    one  0.333008
    2    two  0.395667
    4    two  0.916435
    5    two  1.076679
    6    one -0.992324
End:
           A         B
    3  three  1.636125
    7  three -0.593476

采纳答案by Marius

From the docs, the dict has to map from labelsto group names, so this will work if you put 'A'into the index:

docs,字典必须从标签映射到组名,所以如果你放入'A'索引,这将起作用:

grouped2 = df.set_index('A').groupby(d)
for group_name, data in grouped2:
    print group_name
    print '---------'
    print data

# Output:
End
---------
              B
A              
three -1.234795
three  0.239209

Start
---------
            B
A            
one -1.924156
one  0.506046
two -1.681980
two  0.605248
two -0.861364
one  0.800431

Column names and row indices are both labels, whereas before you put 'A'into the index, the elements of 'A'are values.

列名和行索引都是label,而在放入'A'索引之前, 的元素'A'values

If you have other info in the index that makes doing a set_index()tricky, you can just create a grouping column with map():

如果您在索引中有其他信息使操作变得set_index()棘手,您可以使用以下内容创建一个分组列map()

df['group'] = df['A'].map(d)
grouped3 = df.groupby('group')

回答by David Robinson

You can group with a dictionary, but (as with any group by operation) you need to set the index column first.

您可以使用字典进行分组,但是(与任何分组操作一样)您需要先设置索引列。

grouped = df.set_index("A").groupby(d)

list(grouped)
# [('End',               B
# A              
# three -1.550727
# three  1.048730
# 
# [2 rows x 1 columns]), ('Start',             B
# A            
# one -1.552152
# one -2.018647
# two -0.968068
# two  0.449016
# two -0.374453
# one  0.116770
# 
# [6 rows x 1 columns])]