从 Pandas groupby 对象中选择多个组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31535442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:39:14  来源:igfitidea点击:

Select multiple groups from pandas groupby object

pythonpandas

提问by lib

I am experimenting with the groupby features of pandas, in particular

我正在试验Pandas的 groupby 功能,特别是

gb = df.groupby('model')
gb.hist()

Since gb has 50 groups the result is quite cluttered, I would like to explore the result only for the first 5 groups.

由于 gb 有 50 个组,结果非常混乱,我只想探索前 5 个组的结果。

I found how to select a single group with groupsor get_group(How to access pandas groupby dataframe by key), but not how to select multiple groups directly. The best I could do is :

我找到了如何使用groupsget_groupHow to access pandas groupby dataframe by key)选择单个组,但没有找到如何直接选择多个组。我能做的最好的是:

groups = dict(list(gb))
subgroup = pd.concat(groups.values()[:4])
subgroup.groupby('model').hist()

Is there a more direct way?

有没有更直接的方法?

采纳答案by dermen

You can do something like

你可以做类似的事情

new_gb = pandas.concat( [ gb.get_group(group) for i,group in enumerate( gb.groups) if i < 5 ] ).groupby('model')    
new_gb.hist()

Although, I would approach it differently. You can use the collections.Counterobject to get groups fast:

虽然,我会以不同的方式处理它。您可以使用该collections.Counter对象快速获取组:

import collections

df = pandas.DataFrame.from_dict({'model': pandas.np.random.randint(0, 3, 10), 'param1': pandas.np.random.random(10), 'param2':pandas.np.random.random(10)})
#   model    param1    param2
#0      2  0.252379  0.985290
#1      1  0.059338  0.225166
#2      0  0.187259  0.808899
#3      2  0.773946  0.696001
#4      1  0.680231  0.271874
#5      2  0.054969  0.328743
#6      0  0.734828  0.273234
#7      0  0.776684  0.661741
#8      2  0.098836  0.013047
#9      1  0.228801  0.827378
model_groups = collections.Counter(df.model)
print(model_groups) #Counter({2: 4, 0: 3, 1: 3})

Now you can iterate over the Counterobject like a dictionary, and query the groups you want:

现在您可以Counter像字典一样遍历对象,并查询您想要的组:

new_df = pandas.concat( [df.query('model==%d'%key) for key,val in model_groups.items() if val < 4 ] ) # for example, but you can select the models however you like  
#   model    param1    param2
#2      0  0.187259  0.808899
#6      0  0.734828  0.273234
#7      0  0.776684  0.661741
#1      1  0.059338  0.225166
#4      1  0.680231  0.271874
#9      1  0.228801  0.827378

Now you can use the built-in pandas.DataFrame.groupbyfunction

现在您可以使用内置pandas.DataFrame.groupby函数

gb = new_df.groupby('model')
gb.hist() 

Since model_groupscontains all of the groups, you can just pick from it as you wish.

由于model_groups包含所有组,您可以根据需要从中选择。

note

笔记

If your modelcolumn contains string values (names or something) instead of integers, it will all work the same - just change the query argument from 'model==%d'%keyto 'model=="%s"'%key.

如果您的model列包含字符串值(名称或其他内容)而不是整数,则它的工作方式相同 - 只需将查询参数从 更改'model==%d'%key'model=="%s"'%key

回答by EdChum

It'd be easier to just filter your df first and then perform the groupby:

首先过滤您的 df 然后执行以下操作会更容易groupby

In [155]:

df = pd.DataFrame({'model':np.random.randint(1,10,100), 'value':np.random.randn(100)})
first_five = df['model'].sort(inplace=False).unique()[:5]
gp = df[df['model'].isin(first_five)].groupby('model')
gp.first()
Out[155]:
          value
model          
1     -0.505677
2      1.217027
3     -0.641583
4      0.778104
5     -1.037858

回答by firelynx

I don't know of a way to use the .get_group()methodwith more than one group.

我不知道有一种方法可以在多个组中使用该.get_group()方法

You can however iterate through groups

但是,您可以遍历组

It is still a bit ugly to do this, but here is one solution with iteration:

这样做仍然有点难看,但这里有一个迭代的解决方案:

limit = 5
i = 0
for key, group in gd:
    print key, group
    i += 1
    if i >= limit:
        break

You could also do a loop with .get_group(), which imho. is a little prettier, but still quite ugly.

你也可以用.get_group(),imho做一个循环。有点漂亮,但仍然很丑。

for key in gd.groups.keys()[:2]:
    print gd.get_group(key)

回答by Shahdab Khatri

def get_groups(group_object):
    for i in group_object.groups.keys():
        print(f"____{i}____")
        display(group_object.get_group(i))


#get all groups by calling this method 

get_groups( any_group_which_you_made )