Python 从 Pandas 中的 GroupBy 对象获取所有键

Question

提问by Nate

I'm looking for a way to get a list of all the keys in a GroupBy object, but I can't seem to find one via the docs nor through Google.

我正在寻找一种方法来获取 GroupBy 对象中所有键的列表，但我似乎无法通过文档或谷歌找到一个。

There is definitely a way to access the groups through their keys, like so:

肯定有一种方法可以通过它们的密钥访问组，如下所示：

df_gb = df.groupby(['EmployeeNumber'])
df_gb.get_group(key)

...so I figure there's a way to access a list (or the like) of the keys in a GroupBy object. I'm looking for something like this:

...所以我认为有一种方法可以访问 GroupBy 对象中的键列表（或类似内容）。我正在寻找这样的东西：

df_gb.keys
Out: [1234, 2356, 6894, 9492]

I figure I could just loop through the GroupBy object and get the keys that way, but I think there's got to be a better way.

我想我可以循环遍历 GroupBy 对象并以这种方式获取密钥，但我认为必须有更好的方法。

Answer 1

回答by EdChum

You can access this via attribute .groupson the groupbyobject, this returns a dict, the keys of the dict gives you the groups:

您可以通过对象.groups上的属性访问它groupby，这将返回一个字典，字典的键为您提供了组：

In [40]:
df = pd.DataFrame({'group':[0,1,1,1,2,2,3,3,3], 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()

Out[40]:
dict_keys([0, 1, 2, 3])

here is the output from groups:

这是来自的输出groups：

In [41]:
gp.groups

Out[41]:
{0: Int64Index([0], dtype='int64'),
 1: Int64Index([1, 2, 3], dtype='int64'),
 2: Int64Index([4, 5], dtype='int64'),
 3: Int64Index([6, 7, 8], dtype='int64')}

Update

更新

it looks like that because the type of groupsis a dictthen the group order isn't maintained when you call keys:

看起来是这样，因为类型groups是 adict那么当您调用时不会维护组顺序keys：

In [65]:
df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()

Out[65]:
dict_keys(['b', 'e', 'g', 'a', 'x'])

if you call groupsyou can see the order is maintained:

如果你打电话，groups你可以看到订单被维护：

In [79]:
gp.groups

Out[79]:
{'a': Int64Index([2, 3, 4], dtype='int64'),
 'b': Int64Index([0, 5, 8], dtype='int64'),
 'e': Int64Index([7], dtype='int64'),
 'g': Int64Index([1], dtype='int64'),
 'x': Int64Index([6], dtype='int64')}

then the key order is maintained, a hack around this is to access the .nameattribute of each group:

然后维护密钥顺序，解决这个问题的方法是访问.name每个组的属性：

In [78]:
gp.apply(lambda x: x.name)

Out[78]:
group
a    a
b    b
e    e
g    g
x    x
dtype: object

which isn't great as this isn't vectorised, however if you already have an aggregated object then you can just get the index values:

这不是很好，因为这不是矢量化的，但是如果您已经有一个聚合对象，那么您就可以获取索引值：

In [81]:
agg = gp.sum()
agg

Out[81]:
       val
group     
a        9
b       13
e        7
g        1
x        6

In [83]:    
agg.index.get_level_values(0)

Out[83]:
Index(['a', 'b', 'e', 'g', 'x'], dtype='object', name='group')

Answer 2

回答by user11827562

Use the option sort=False to have group key order reserved gp = df.groupby('group', sort=False)

使用选项 sort=False 保留组键顺序 gp = df.groupby('group', sort=False)

Answer 3

回答by Dr_Zaszu?

A problem with EdChum's answer is that getting keys by launching gp.groups.keys()first constructs the full group dictionary. On large dataframes, this is a very slow operation, which effectively doubles the memory consumption. Iterating is waaay faster:

EdChum 的答案的一个问题是通过启动gp.groups.keys()首先构建完整的组字典来获取密钥。在大型数据帧上，这是一个非常慢的操作，它有效地使内存消耗加倍。迭代速度更快：

df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
gp = df.groupby('group')
keys = [key for key, _ in gp]

Executing this list comprehension took me 16 son my groupby object, while I had to interrupt gp.groups.keys()after 3 minutes.

执行这个列表理解让我16 s处理了 groupby 对象，而我不得不gp.groups.keys()在 3 分钟后打断它。

Python 从 Pandas 中的 GroupBy 对象获取所有键

提问by Nate

回答by EdChum

回答by user11827562

回答by Dr_Zaszu?

相关推荐

最近更新

标签

Python 从 Pandas 中的 GroupBy 对象获取所有键

提问by Nate

回答by EdChum

回答by user11827562

回答by Dr_Zaszu?

相关推荐

Python Pandas Groupby 和 Sum Only 一列

Python 将字典转换为熊猫数据框

Python 熊猫中的 sort_values() 方法

Python tensorflow 中的 eval() 和 run()

相关推荐

最近更新

标签