Python Pandas 如何使用 pd.cut()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45751390/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas how to use pd.cut()
提问by Cheng
Here is the snippet:
这是片段:
test = pd.DataFrame({'days': [0,31,45]})
test['range'] = pd.cut(test.days, [0,30,60])
Output:
输出:
days range
0 0 NaN
1 31 (30, 60]
2 45 (30, 60]
I am surprised that 0 is not in (0, 30], what should I do to categorize 0 as (0, 30]?
我很惊讶 0 不在 (0, 30] 中,我该怎么做才能将 0 归类为 (0, 30]?
回答by jezrael
test['range'] = pd.cut(test.days, [0,30,60], include_lowest=True)
print (test)
days range
0 0 (-0.001, 30.0]
1 31 (30.0, 60.0]
2 45 (30.0, 60.0]
See difference:
看区别:
test = pd.DataFrame({'days': [0,20,30,31,45,60]})
test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True)
#30 value is in [30, 60) group
test['range2'] = pd.cut(test.days, [0,30,60], right=False)
#30 value is in (0, 30] group
test['range3'] = pd.cut(test.days, [0,30,60])
print (test)
days range1 range2 range3
0 0 (-0.001, 30.0] [0, 30) NaN
1 20 (-0.001, 30.0] [0, 30) (0, 30]
2 30 (-0.001, 30.0] [30, 60) (0, 30]
3 31 (30.0, 60.0] [30, 60) (30, 60]
4 45 (30.0, 60.0] [30, 60) (30, 60]
5 60 (30.0, 60.0] NaN (30, 60]
Or use numpy.searchsorted
, but values of days
hast to be sorted:
或使用numpy.searchsorted
,但days
必须对 的值进行排序:
arr = np.array([0,30,60])
test['range1'] = arr.searchsorted(test.days)
test['range2'] = arr.searchsorted(test.days, side='right') - 1
print (test)
days range1 range2
0 0 0 0
1 20 1 0
2 30 1 1
3 31 2 1
4 45 2 1
5 60 2 2
回答by piRSquared
pd.cut
documentation
Include parameter right=False
pd.cut
文档
包含参数right=False
test = pd.DataFrame({'days': [0,31,45]})
test['range'] = pd.cut(test.days, [0,30,60], right=False)
test
days range
0 0 [0, 30)
1 31 [30, 60)
2 45 [30, 60)
回答by Mino De Raj
You can use labels to pd.cut() as well. The following example contains the grade of students in the range from 0-10. We're adding a new column called 'grade_cat' to categorize the grades.
您也可以对 pd.cut() 使用标签。以下示例包含 0-10 范围内的学生成绩。我们添加了一个名为“grade_cat”的新列来对成绩进行分类。
bins represent the intervals: 0-4 is one interval, 5-6 is one interval, and so on The corresponding labels are "poor", "normal", etc
bins代表区间:0-4为1个区间,5-6为1个区间,依此类推对应的标签为“差”、“正常”等
bins = [0, 4, 6, 10]
labels = ["poor","normal","excellent"]
student['grade_cat'] = pd.cut(student['grade'], bins=bins, labels=labels)
回答by nashtgc
A sample of how the .cut works
.cut 如何工作的示例
s=pd.Series([168,180,174,190,170,185,179,181,175,169,182,177,180,171)
pd.cut(s,3)
#To add labels to bins
pd.cut(s,3,labels=["Small","Medium","Large"])
This can be used directly on a range
这可以直接用于范围
回答by ashunigion
@jezrael has explained almost all the use-cases of pd.cut()
@jezrael 已经解释了几乎所有的用例 pd.cut()
One use-case that i would like to add is the following
我想添加的一个用例如下
pd.cut(np.array([1,2,3,4,5,6]),3)
pd.cut(np.array([1,2,3,4,5,6]),3)
the number of binsare decided by the second parameter, thus we have following output
bin的数量由第二个参数决定,因此我们有以下输出
[(0.995,2.667],(0.995,2.667],(2.667,4.333],(2.667,4.333], (4.333,6.0], (4.333,6.0]]
Categories (3, interval[float64]): [(0.995,2.667] < (2.667,4.333] < (4.333,6.0]]
Similarly if we use the number of bin parameter(second parameter)as 2following will be the output
同样,如果我们使用bin 参数(第二个参数)的数量作为2以下将是输出
[(0.995, 3.5], (0.995, 3.5], (0.995, 3.5], (3.5, 6.0], (3.5, 6.0], (3.5, 6.0]]
Categories (2, interval[float64]): [(0.995, 3.5] < (3.5, 6.0]]