Python 从 Pandas 中的 GroupBy 对象获取所有键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42513049/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:48:54  来源:igfitidea点击:

Get all keys from GroupBy object in Pandas

pythonpandas

提问by Nate

I'm looking for a way to get a list of all the keys in a GroupBy object, but I can't seem to find one via the docs nor through Google.

我正在寻找一种方法来获取 GroupBy 对象中所有键的列表,但我似乎无法通过文档或谷歌找到一个。

There is definitely a way to access the groups through their keys, like so:

肯定有一种方法可以通过它们的密钥访问组,如下所示:

df_gb = df.groupby(['EmployeeNumber'])
df_gb.get_group(key)

...so I figure there's a way to access a list (or the like) of the keys in a GroupBy object. I'm looking for something like this:

...所以我认为有一种方法可以访问 GroupBy 对象中的键列表(或类似内容)。我正在寻找这样的东西:

df_gb.keys
Out: [1234, 2356, 6894, 9492]

I figure I could just loop through the GroupBy object and get the keys that way, but I think there's got to be a better way.

我想我可以循环遍历 GroupBy 对象并以这种方式获取密钥,但我认为必须有更好的方法。

回答by EdChum

You can access this via attribute .groupson the groupbyobject, this returns a dict, the keys of the dict gives you the groups:

您可以通过对象.groups上的属性访问它groupby,这将返回一个字典,字典的键为您提供了组:

In [40]:
df = pd.DataFrame({'group':[0,1,1,1,2,2,3,3,3], 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()

Out[40]:
dict_keys([0, 1, 2, 3])

here is the output from groups:

这是来自的输出groups

In [41]:
gp.groups

Out[41]:
{0: Int64Index([0], dtype='int64'),
 1: Int64Index([1, 2, 3], dtype='int64'),
 2: Int64Index([4, 5], dtype='int64'),
 3: Int64Index([6, 7, 8], dtype='int64')}

Update

更新

it looks like that because the type of groupsis a dictthen the group order isn't maintained when you call keys:

看起来是这样,因为类型groups是 adict那么当您调用时不会维护组顺序keys

In [65]:
df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()

Out[65]:
dict_keys(['b', 'e', 'g', 'a', 'x'])

if you call groupsyou can see the order is maintained:

如果你打电话,groups你可以看到订单被维护:

In [79]:
gp.groups

Out[79]:
{'a': Int64Index([2, 3, 4], dtype='int64'),
 'b': Int64Index([0, 5, 8], dtype='int64'),
 'e': Int64Index([7], dtype='int64'),
 'g': Int64Index([1], dtype='int64'),
 'x': Int64Index([6], dtype='int64')}

then the key order is maintained, a hack around this is to access the .nameattribute of each group:

然后维护密钥顺序,解决这个问题的方法是访问.name每个组的属性:

In [78]:
gp.apply(lambda x: x.name)

Out[78]:
group
a    a
b    b
e    e
g    g
x    x
dtype: object

which isn't great as this isn't vectorised, however if you already have an aggregated object then you can just get the index values:

这不是很好,因为这不是矢量化的,但是如果您已经有一个聚合对象,那么您就可以获取索引值:

In [81]:
agg = gp.sum()
agg

Out[81]:
       val
group     
a        9
b       13
e        7
g        1
x        6

In [83]:    
agg.index.get_level_values(0)

Out[83]:
Index(['a', 'b', 'e', 'g', 'x'], dtype='object', name='group')

回答by user11827562

Use the option sort=False to have group key order reserved gp = df.groupby('group', sort=False)

使用选项 sort=False 保留组键顺序 gp = df.groupby('group', sort=False)

回答by Dr_Zaszu?

A problem with EdChum's answer is that getting keys by launching gp.groups.keys()first constructs the full group dictionary. On large dataframes, this is a very slow operation, which effectively doubles the memory consumption. Iterating is waaay faster:

EdChum 的答案的一个问题是通过启动gp.groups.keys()首先构建完整的组字典来获取密钥。在大型数据帧上,这是一个非常慢的操作,它有效地使内存消耗加倍。迭代速度更快:

df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
gp = df.groupby('group')
keys = [key for key, _ in gp]

Executing this list comprehension took me 16 son my groupby object, while I had to interrupt gp.groups.keys()after 3 minutes.

执行这个列表理解让我16 s处理了 groupby 对象,而我不得不gp.groups.keys()在 3 分钟后打断它。