Python 更好的 Pandas 分箱

Question

提问by monkut

I've got a data frame and want to filter or bin by a range of values and then get the counts of values in each bin.

我有一个数据框，想按一系列值进行过滤或分箱，然后获取每个分箱中的值计数。

Currently, I'm doing this:

目前，我正在这样做：

x = 5
y = 17
z = 33
filter_values = [x, y, z]
filtered_a = df[df.filtercol <= x]
a_count = filtered_a.filtercol.count()

filtered_b = df[df.filtercol > x]
filtered_b = filtered_b[filtered_b <= y]
b_count = filtered_b.filtercol.count()

filtered_c = df[df.filtercol > y]
c_count = filtered_c.filtercol.count()

But is there a more concise way to accomplish the same thing?

但是有没有更简洁的方法来完成同样的事情？

Answer 1

采纳答案by unutbu

Perhaps you are looking for pandas.cut:

也许您正在寻找pandas.cut：

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(50), columns=['filtercol'])
filter_values = [0, 5, 17, 33]   
out = pd.cut(df.filtercol, bins=filter_values)
counts = pd.value_counts(out)
# counts is a Series
print(counts)

yields

产量

(17, 33]    16
(5, 17]     12
(0, 5]       5

To reorder the result so the bin ranges appear in order, you could use

要对结果重新排序以使 bin 范围按顺序显示，您可以使用

counts.sort_index()

which yields

这产生

(0, 5]       5
(5, 17]     12
(17, 33]    16

Thanks to nivnivand InLawfor this improvement.

感谢nivniv和InLaw的改进。

Python 更好的 Pandas 分箱

提问by monkut

采纳答案by unutbu

相关推荐

最近更新

标签

Python 更好的 Pandas 分箱

提问by monkut

采纳答案by unutbu

相关推荐

在python中如何将一位数转换为两位数的字符串？

python 不被识别为内部或外部命令

Python：使评估安全

这个python排序方法的复杂度是多少？

相关推荐

最近更新

标签