pandas 熊猫计算每个范围之间的值的数量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35047604/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:34:32  来源:igfitidea点击:

Pandas calculate number of values between each range

pythonpandas

提问by AZhao

I want to find counts of my data between certain custom ranges.

我想在某些自定义范围之间查找我的数据计数。

Say I have some data:

说我有一些数据:

import random

my_randoms = random.sample(xrange(100), 10)        
test = pd.DataFrame(my_randoms,columns = ["x"])

How can I produce a data frame that shows the number of values between different ranges? For example, say I want to see how many values occur between 0-19, 20-39, 40-59, 60-79, 80-100. The output dataframe will have one column with those ranges, another with the counts.

如何生成显示不同范围之间值数量的数据框?例如,假设我想查看在 0-19、20-39、40-59、60-79、80-100 之间出现了多少个值。输出数据帧将有一列包含这些范围,另一列包含计数。

I can think of some ugly approaches that involve use of .apply to get a new column list saying which value they are between (and then doing a groupby), but I suspect pandas has a cleaner way lurking about.

我可以想到一些丑陋的方法,这些方法涉及使用 .apply 来获取一个新的列列表,说明它们之间的值(然后进行 groupby),但我怀疑 pandas 有一种更干净的方法潜伏。

采纳答案by AZhao

Per Jarad's link to that other question:

根据 Jarad 对另一个问题的链接:

test.groupby(pd.cut(test['x'], np.arange(0,100,20))).count()

回答by Gregory Kuhn

there's probably a better way. I'm only new to pandas myself but how about this for the moment:

可能有更好的方法。我自己只是Pandas的新手,但现在如何:

test.query(test.x.isin(range(20)))

回答by thekingofkings

pandas and numpy allow boolean index, is this an ugly approach?

pandas 和 numpy 允许boolean index,这是一种丑陋的方法吗?

ranges = [ (0,19), (20, 39), (40, 69) ...]
cnt = []
for range in ranges:
    tmp = ranges[(ranges['x'] > range[0]) & (range['x'] <= range[1]) ]
    cnt.append( len(tmp) )

回答by thekingofkings

You can use the numpy.histrogramfunction.

您可以使用该numpy.histrogram功能。

import numpy as np
series = [0, 20, 40, ...]
count, bin_edge = np.histogram( bins = series )

According to numpy.histogram, if binsis a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

根据numpy.histogram,如果bins是一个序列,它定义了 bin 边缘,包括最右边的边缘,允许不均匀的 bin 宽度。