pandas 熊猫计算每个范围之间的值的数量

Question

提问by AZhao

I want to find counts of my data between certain custom ranges.

我想在某些自定义范围之间查找我的数据计数。

Say I have some data:

说我有一些数据：

import random

my_randoms = random.sample(xrange(100), 10)        
test = pd.DataFrame(my_randoms,columns = ["x"])

How can I produce a data frame that shows the number of values between different ranges? For example, say I want to see how many values occur between 0-19, 20-39, 40-59, 60-79, 80-100. The output dataframe will have one column with those ranges, another with the counts.

如何生成显示不同范围之间值数量的数据框？例如，假设我想查看在 0-19、20-39、40-59、60-79、80-100 之间出现了多少个值。输出数据帧将有一列包含这些范围，另一列包含计数。

I can think of some ugly approaches that involve use of .apply to get a new column list saying which value they are between (and then doing a groupby), but I suspect pandas has a cleaner way lurking about.

我可以想到一些丑陋的方法，这些方法涉及使用 .apply 来获取一个新的列列表，说明它们之间的值（然后进行 groupby），但我怀疑 pandas 有一种更干净的方法潜伏。

Answer 1

采纳答案by AZhao

Per Jarad's link to that other question:

根据 Jarad 对另一个问题的链接：

test.groupby(pd.cut(test['x'], np.arange(0,100,20))).count()

Answer 2

回答by Gregory Kuhn

there's probably a better way. I'm only new to pandas myself but how about this for the moment:

可能有更好的方法。我自己只是Pandas的新手，但现在如何：

test.query(test.x.isin(range(20)))

Answer 3

回答by thekingofkings

pandas and numpy allow boolean index, is this an ugly approach?

pandas 和 numpy 允许boolean index，这是一种丑陋的方法吗？

ranges = [ (0,19), (20, 39), (40, 69) ...]
cnt = []
for range in ranges:
    tmp = ranges[(ranges['x'] > range[0]) & (range['x'] <= range[1]) ]
    cnt.append( len(tmp) )

Answer 4

回答by thekingofkings

You can use the numpy.histrogramfunction.

您可以使用该numpy.histrogram功能。

import numpy as np
series = [0, 20, 40, ...]
count, bin_edge = np.histogram( bins = series )

According to numpy.histogram, if binsis a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

根据numpy.histogram，如果bins是一个序列，它定义了 bin 边缘，包括最右边的边缘，允许不均匀的 bin 宽度。

pandas 熊猫计算每个范围之间的值的数量

提问by AZhao

采纳答案by AZhao

回答by Gregory Kuhn

回答by thekingofkings

回答by thekingofkings

相关推荐

最近更新

标签

pandas 熊猫计算每个范围之间的值的数量

提问by AZhao

采纳答案by AZhao

回答by Gregory Kuhn

回答by thekingofkings

回答by thekingofkings

相关推荐

使用 for 循环重命名 Pandas 数据框列

pandas 数据框 values.tolist() 数据类型

无法导入名为 pandas 的模块

pandas 错误：找到带有暗淡 3 的数组。估计器预期 <= 2

相关推荐

最近更新

标签