Python How to get number of groups in a groupby object in pandas?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27787930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get number of groups in a groupby object in pandas?
提问by wolfsatthedoor
This would be useful so I know how many unique groups I have to perform calculations on. Thank you.
This would be useful so I know how many unique groups I have to perform calculations on. Thank you.
Suppose groupby object is called dfgroup.
Suppose groupby object is called dfgroup.
采纳答案by BrenBarn
As documented, you can get the number of groups with len(dfgroup).
As documented, you can get the number of groups with len(dfgroup).
回答by cs95
[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups
[pandas >= 0.23] Simple, Fast, and Pandaic: ngroups
Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.
Newer versions of the groupby API provide this (undocumented) attribute which stores the number of groups in a GroupBy object.
# setup
df = pd.DataFrame({'A': list('aabbcccd')})
dfg = df.groupby('A')
# call `.ngroups` on the GroupBy object
dfg.ngroups
# 4
Note that this is different from GroupBy.groupswhich returns the actual groups themselves.
Note that this is different from GroupBy.groupswhich returns the actual groups themselves.
Why should I prefer this over len?
Why should I prefer this over len?
As noted in BrenBarn's answer, you could use len(dfg)to get the number of groups. But you shouldn't. Looking at the implementation of GroupBy.__len__(which is what len()calls interally), we see that __len__makes a call to GroupBy.groups, which returns a dictionary of grouped indices:
As noted in BrenBarn's answer, you could use len(dfg)to get the number of groups. But you shouldn't. Looking at the implementation of GroupBy.__len__(which is what len()calls interally), we see that __len__makes a call to GroupBy.groups, which returns a dictionary of grouped indices:
dfg.groups
{'a': Int64Index([0, 1], dtype='int64'),
'b': Int64Index([2, 3], dtype='int64'),
'c': Int64Index([4, 5, 6], dtype='int64'),
'd': Int64Index([7], dtype='int64')}
Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroupson the other hand is a stored property that can be accessed in constant time.
Depending on the number of groups in your operation, generating the dictionary only to find its length is a wasteful step. ngroupson the other hand is a stored property that can be accessed in constant time.
This has been documented in GroupByobject attributes. The issue with len, however, is that for a GroupBy object with a lot of groups, this can take a lot longer
This has been documented in GroupByobject attributes. The issue with len, however, is that for a GroupBy object with a lot of groups, this can take a lot longer
But what if I actually want the size of each group?
But what if I actually want the size of each group?
You're in luck. We have a function for that, it's called GroupBy.size. But please note that sizecounts NaNs as well. If you don't want NaNs counted, use GroupBy.countinstead.
You're in luck. We have a function for that, it's called GroupBy.size. But please note that sizecounts NaNs as well. If you don't want NaNs counted, use GroupBy.countinstead.
回答by Shaina Raza
you can use the format specifier as:
you can use the format specifier as:
df.to_csv('filename_%d.csv'%x, index=False)
and you will get filename saved as:filename_1.csv
and you will get filename saved as:filename_1.csv

