Python Pandas 如何使用 pd.cut()

Question

提问by Cheng

Here is the snippet:

这是片段：

test = pd.DataFrame({'days': [0,31,45]})
test['range'] = pd.cut(test.days, [0,30,60])

Output:

输出：

    days    range
0   0       NaN
1   31      (30, 60]
2   45      (30, 60]

I am surprised that 0 is not in (0, 30], what should I do to categorize 0 as (0, 30]?

我很惊讶 0 不在 (0, 30] 中，我该怎么做才能将 0 归类为 (0, 30]？

Answer 1

回答by jezrael

test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True)
print (test)
   days           range
0     0  (-0.001, 30.0]
1    31    (30.0, 60.0]
2    45    (30.0, 60.0]

See difference:

看区别：

test = pd.DataFrame({'days': [0,20,30,31,45,60]})

test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True)
#30 value is in [30, 60) group
test['range2'] = pd.cut(test.days, [0,30,60], right=False)
#30 value is in (0, 30] group
test['range3'] = pd.cut(test.days, [0,30,60])
print (test)
   days          range1    range2    range3
0     0  (-0.001, 30.0]   [0, 30)       NaN
1    20  (-0.001, 30.0]   [0, 30)   (0, 30]
2    30  (-0.001, 30.0]  [30, 60)   (0, 30]
3    31    (30.0, 60.0]  [30, 60)  (30, 60]
4    45    (30.0, 60.0]  [30, 60)  (30, 60]
5    60    (30.0, 60.0]       NaN  (30, 60]

Or use numpy.searchsorted, but values of dayshast to be sorted:

或使用numpy.searchsorted，但days必须对的值进行排序：

arr = np.array([0,30,60])
test['range1'] = arr.searchsorted(test.days)
test['range2'] = arr.searchsorted(test.days, side='right') - 1
print (test)
   days  range1  range2
0     0       0       0
1    20       1       0
2    30       1       1
3    31       2       1
4    45       2       1
5    60       2       2

Answer 2

回答by piRSquared

pd.cutdocumentation
Include parameter right=False

pd.cut文档
包含参数right=False

test = pd.DataFrame({'days': [0,31,45]})
test['range'] = pd.cut(test.days, [0,30,60], right=False)

test

   days     range
0     0   [0, 30)
1    31  [30, 60)
2    45  [30, 60)

Answer 3

回答by Mino De Raj

You can use labels to pd.cut() as well. The following example contains the grade of students in the range from 0-10. We're adding a new column called 'grade_cat' to categorize the grades.

您也可以对 pd.cut() 使用标签。以下示例包含 0-10 范围内的学生成绩。我们添加了一个名为“grade_cat”的新列来对成绩进行分类。

bins represent the intervals: 0-4 is one interval, 5-6 is one interval, and so on The corresponding labels are "poor", "normal", etc

bins代表区间：0-4为1个区间，5-6为1个区间，依此类推对应的标签为“差”、“正常”等

bins = [0, 4, 6, 10]
labels = ["poor","normal","excellent"]
student['grade_cat'] = pd.cut(student['grade'], bins=bins, labels=labels)

Answer 4

回答by nashtgc

A sample of how the .cut works

.cut 如何工作的示例

s=pd.Series([168,180,174,190,170,185,179,181,175,169,182,177,180,171)
    pd.cut(s,3)
    #To add labels to bins
    pd.cut(s,3,labels=["Small","Medium","Large"])

This can be used directly on a range

这可以直接用于范围

Answer 5

回答by ashunigion

@jezrael has explained almost all the use-cases of pd.cut()

@jezrael 已经解释了几乎所有的用例 pd.cut()

One use-case that i would like to add is the following

我想添加的一个用例如下

pd.cut(np.array([1,2,3,4,5,6]),3)

the number of binsare decided by the second parameter, thus we have following output

bin的数量由第二个参数决定，因此我们有以下输出

[(0.995,2.667],(0.995,2.667],(2.667,4.333],(2.667,4.333], (4.333,6.0], (4.333,6.0]]
Categories (3, interval[float64]): [(0.995,2.667] < (2.667,4.333] < (4.333,6.0]]

Similarly if we use the number of bin parameter(second parameter)as 2following will be the output

同样，如果我们使用bin 参数（第二个参数）的数量作为2以下将是输出

[(0.995, 3.5], (0.995, 3.5], (0.995, 3.5], (3.5, 6.0], (3.5, 6.0], (3.5, 6.0]]
Categories (2, interval[float64]): [(0.995, 3.5] < (3.5, 6.0]]

Python Pandas 如何使用 pd.cut()

提问by Cheng

回答by jezrael

回答by piRSquared

回答by Mino De Raj

回答by nashtgc

回答by ashunigion

相关推荐

最近更新

标签

Python Pandas 如何使用 pd.cut()

提问by Cheng

回答by jezrael

回答by piRSquared

回答by Mino De Raj

回答by nashtgc

回答by ashunigion

相关推荐

Python 从列表中删除 nan

Python pyspark 使用 partitionby 对数据进行分区

检查元素是否存在 python selenium

如何在python中合并多个数组？

相关推荐

最近更新

标签