Pandas groupby 如何计算范围内的计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25010215/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:18:06  来源:igfitidea点击:

Pandas groupby how to compute counts in ranges

pythonpandas

提问by user2366975

Say I have a huge list of numbers between 0 and 100. I compute ranges, depending on the max number and then saying there are 10 bins. So my ranges are for example

假设我有一个巨大的 0 到 100 之间的数字列表。我根据最大数字计算范围,然后说有 10 个 bin。所以我的范围是例如

ranges = [0,10,20,30,40,50,60,70,80,90,100]

Now I count the occurances in each range from 0-10, 10-20, and so on. I iterate over every number in the list and check for a range. I assume this is not the best way in terms of runtime speed.

现在我计算每个范围内的出现次数,从 0-10、10-20 等等。我遍历列表中的每个数字并检查范围。我认为这不是运行速度方面的最佳方式。

Can I fasten it up by using pandas, e.g. pandas.groupby, and how?

我可以使用Pandas来固定它,例如 pandas.groupby,以及如何固定?

回答by EdChum

We can use pd.cutto bin the values into ranges, then we can groupbythese ranges, and finally call countto count the values now binned into these ranges:

我们可以使用pd.cut将值合并到范围内,然后我们可以使用groupby这些范围,最后调用count计算现在合并到这些范围内的值:

In [82]:

df = pd.DataFrame({"a": np.random.random_integers(0, high=100, size=100)})
ranges = [0,10,20,30,40,50,60,70,80,90,100]
df.groupby(pd.cut(df.a, ranges)).count()
Out[82]:
            a
a            
(0, 10]    10
(10, 20]    6
(20, 30]   12
(30, 40]    9
(40, 50]   11
(50, 60]   12
(60, 70]    9
(70, 80]   13
(80, 90]    9
(90, 100]   9