Qcut Pandas:ValueError:Bin 边缘必须是唯一的
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38309144/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Qcut Pandas : ValueError: Bin edges must be unique
提问by Arij SEDIRI
I'm using Qcut from Pandas in order to discretize my Data into equal-sized buckets. I want to have price buckets. This is my DataFrame :
我正在使用 Pandas 的 Qcut 来将我的数据离散化到大小相等的桶中。我想要价格桶。这是我的数据帧:
productId sell_prix categ popularity
11997 16758760.0 28.75 50 524137.0
11998 16758760.0 28.75 50 166795.0
13154 16782105.0 24.60 50 126890.5
13761 16790082.0 65.00 50 245437.0
13762 16790082.0 65.00 50 245242.0
15355 16792720.0 29.00 50 360219.0
15356 16792720.0 29.00 50 360100.0
15357 16792720.0 29.00 50 360027.0
15358 16792720.0 29.00 50 462850.0
15367 16792728.0 29.00 50 193030.5
And this is my code :
这是我的代码:
df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)
I have this error message :
我有这个错误信息:
**ValueError: Bin edges must be unique: array([ 24.6, 29. , 29. , 65. ])**
In reality, I have a DataFrame with 7413 rows. So this is just a sampling of the real DataFrame. The strange thing is that when I use the same code with a DataFrame with 359824 rows, with practically the same Data, it works ! Is there any dependence with the length of DataFrame ?
实际上,我有一个包含 7413 行的 DataFrame。所以这只是真实 DataFrame 的一个样本。奇怪的是,当我将相同的代码与具有 359824 行的 DataFrame 一起使用时,几乎相同的数据,它起作用了!与 DataFrame 的长度有任何依赖关系吗?
Help please ! Many thanks.
请帮忙 !非常感谢。
回答by luca
回答by Fortunato
The 'sell_prix' field in your smaller DataFrame don't have enough unique values to break into three equally-sized buckets. As a result, the endpoint of the first and second bucket are the same, which is why you are getting an error.
较小的 DataFrame 中的 'sell_prix' 字段没有足够的唯一值来分成三个相同大小的存储桶。结果,第一个和第二个存储桶的端点相同,这就是您收到错误的原因。
Consider
考虑
df = pd.DataFrame([[1,2,3],[1,4,5],[1,5,6],[1,3,4], [2,3,4]], columns = ['a','b','c'])
df
a b c
0 1 2 3
1 1 4 5
2 1 5 6
3 1 3 4
4 2 3 4
pd.qcut(df['a'], 3)
ValueError: Bin edges must be unique: array([ 1., 1., 1., 2.])
try using cut
尝试使用 cut
pd.cut(df['a'], 3)
0 (0.999, 1.333]
1 (0.999, 1.333]
2 (0.999, 1.333]
3 (0.999, 1.333]
4 (1.667, 2]
Name: a, dtype: category
Categories (3, object): [(0.999, 1.333] < (1.333, 1.667] < (1.667, 2]]