Python How to get number of groups in a groupby object in pandas?

Question

提问by wolfsatthedoor

This would be useful so I know how many unique groups I have to perform calculations on. Thank you.

Suppose groupby object is called dfgroup.

Answer 1

采纳答案by BrenBarn

As documented, you can get the number of groups with len(dfgroup).

Answer 2

回答by cs95

[pandas >= 0.23] Simple, Fast, and Pandaic: `ngroups`

Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.

# setup
df = pd.DataFrame({'A': list('aabbcccd')})
dfg = df.groupby('A')

# call `.ngroups` on the GroupBy object
dfg.ngroups
# 4

Note that this is different from GroupBy.groupswhich returns the actual groups themselves.

Why should I prefer this over `len`?

As noted in BrenBarn's answer, you could use len(dfg)to get the number of groups. But you shouldn't. Looking at the implementation of GroupBy.__len__(which is what len()calls interally), we see that __len__makes a call to GroupBy.groups, which returns a dictionary of grouped indices:

dfg.groups
{'a': Int64Index([0, 1], dtype='int64'),
 'b': Int64Index([2, 3], dtype='int64'),
 'c': Int64Index([4, 5, 6], dtype='int64'),
 'd': Int64Index([7], dtype='int64')}

Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroupson the other hand is a stored property that can be accessed in constant time.

This has been documented in GroupByobject attributes. The issue with len, however, is that for a GroupBy object with a lot of groups, this can take a lot longer

But what if I actually want the size of each group?

You're in luck. We have a function for that, it's called GroupBy.size. But please note that sizecounts NaNs as well. If you don't want NaNs counted, use GroupBy.countinstead.

Answer 3

回答by Shaina Raza

you can use the format specifier as:

df.to_csv('filename_%d.csv'%x, index=False)

and you will get filename saved as:filename_1.csv

Python How to get number of groups in a groupby object in pandas?

提问by wolfsatthedoor

采纳答案by BrenBarn

回答by cs95

[pandas >= 0.23] Simple, Fast, and Pandaic: `ngroups`

[pandas >= 0.23] Simple, Fast, and Pandaic: `ngroups`

Why should I prefer this over `len`?

Why should I prefer this over `len`?

But what if I actually want the size of each group?

But what if I actually want the size of each group?

回答by Shaina Raza

相关推荐

最近更新

标签

Python How to get number of groups in a groupby object in pandas?

提问by wolfsatthedoor

采纳答案by BrenBarn

回答by cs95

[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups

[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups

Why should I prefer this over len?

Why should I prefer this over len?

But what if I actually want the size of each group?

But what if I actually want the size of each group?

回答by Shaina Raza

相关推荐

Python Redis 发布订阅和消息队列

Python：如何编辑已安装的包？

Python Scrapy：提取链接和文本

如何在python中“测试”NoneType？

相关推荐

最近更新

标签

[pandas >= 0.23] Simple, Fast, and Pandaic: `ngroups`

[pandas >= 0.23] Simple, Fast, and Pandaic: `ngroups`

Why should I prefer this over `len`?

Why should I prefer this over `len`?