Python How to get number of groups in a groupby object in pandas?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27787930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:16:20  来源:igfitidea点击:

How to get number of groups in a groupby object in pandas?

pythonpandasdataframegroup-bypandas-groupby

提问by wolfsatthedoor

This would be useful so I know how many unique groups I have to perform calculations on. Thank you.

This would be useful so I know how many unique groups I have to perform calculations on. Thank you.

Suppose groupby object is called dfgroup.

Suppose groupby object is called dfgroup.

采纳答案by BrenBarn

As documented, you can get the number of groups with len(dfgroup).

As documented, you can get the number of groups with len(dfgroup).

回答by cs95

[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups

[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups

Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.

Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.

# setup
df = pd.DataFrame({'A': list('aabbcccd')})
dfg = df.groupby('A')

# call `.ngroups` on the GroupBy object
dfg.ngroups
# 4

Note that this is different from GroupBy.groupswhich returns the actual groups themselves.

Note that this is different from GroupBy.groupswhich returns the actual groups themselves.

Why should I prefer this over len?

Why should I prefer this over len?

As noted in BrenBarn's answer, you could use len(dfg)to get the number of groups. But you shouldn't. Looking at the implementation of GroupBy.__len__(which is what len()calls interally), we see that __len__makes a call to GroupBy.groups, which returns a dictionary of grouped indices:

As noted in BrenBarn's answer, you could use len(dfg)to get the number of groups. But you shouldn't. Looking at the implementation of GroupBy.__len__(which is what len()calls interally), we see that __len__makes a call to GroupBy.groups, which returns a dictionary of grouped indices:

dfg.groups
{'a': Int64Index([0, 1], dtype='int64'),
 'b': Int64Index([2, 3], dtype='int64'),
 'c': Int64Index([4, 5, 6], dtype='int64'),
 'd': Int64Index([7], dtype='int64')}

Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroupson the other hand is a stored property that can be accessed in constant time.

Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroupson the other hand is a stored property that can be accessed in constant time.

This has been documented in GroupByobject attributes. The issue with len, however, is that for a GroupBy object with a lot of groups, this can take a lot longer

This has been documented in GroupByobject attributes. The issue with len, however, is that for a GroupBy object with a lot of groups, this can take a lot longer

But what if I actually want the size of each group?

But what if I actually want the size of each group?

You're in luck. We have a function for that, it's called GroupBy.size. But please note that sizecounts NaNs as well. If you don't want NaNs counted, use GroupBy.countinstead.

You're in luck. We have a function for that, it's called GroupBy.size. But please note that sizecounts NaNs as well. If you don't want NaNs counted, use GroupBy.countinstead.

回答by Shaina Raza

you can use the format specifier as:

you can use the format specifier as:

df.to_csv('filename_%d.csv'%x, index=False)

and you will get filename saved as:filename_1.csv

and you will get filename saved as:filename_1.csv