Qcut Pandas：ValueError：Bin 边缘必须是唯一的

Question

提问by Arij SEDIRI

I'm using Qcut from Pandas in order to discretize my Data into equal-sized buckets. I want to have price buckets. This is my DataFrame :

我正在使用 Pandas 的 Qcut 来将我的数据离散化到大小相等的桶中。我想要价格桶。这是我的数据帧：

        productId   sell_prix   categ   popularity
11997   16758760.0  28.75        50      524137.0
11998   16758760.0  28.75        50      166795.0
13154   16782105.0  24.60        50      126890.5
13761   16790082.0  65.00        50      245437.0
13762   16790082.0  65.00        50      245242.0
15355   16792720.0  29.00        50      360219.0
15356   16792720.0  29.00        50      360100.0
15357   16792720.0  29.00        50      360027.0
15358   16792720.0  29.00        50      462850.0
15367   16792728.0  29.00        50      193030.5

And this is my code :

这是我的代码：

df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)

I have this error message :

我有这个错误信息：

**ValueError: Bin edges must be unique: array([ 24.6,  29. ,  29. ,  65. ])**

In reality, I have a DataFrame with 7413 rows. So this is just a sampling of the real DataFrame. The strange thing is that when I use the same code with a DataFrame with 359824 rows, with practically the same Data, it works ! Is there any dependence with the length of DataFrame ?

实际上，我有一个包含 7413 行的 DataFrame。所以这只是真实 DataFrame 的一个样本。奇怪的是，当我将相同的代码与具有 359824 行的 DataFrame 一起使用时，几乎相同的数据，它起作用了！与 DataFrame 的长度有任何依赖关系吗？

Help please ! Many thanks.

请帮忙！非常感谢。

Answer 1

回答by luca

Various solutions are discussed here, but briefly:

这里讨论了各种解决方案，但简要说明：

> pd.qcut(df['a'].rank(method='first'), 3)
0        [1, 2.333]
1        [1, 2.333]
2    (2.333, 3.667]
3        (3.667, 5]
4        (3.667, 5]

Or

或者

> pd.qcut(df['a'].rank(method='first'), 3, labels=False)
0    0
1    0
2    1
3    2
4    2

Answer 2

回答by Fortunato

The 'sell_prix' field in your smaller DataFrame don't have enough unique values to break into three equally-sized buckets. As a result, the endpoint of the first and second bucket are the same, which is why you are getting an error.

较小的 DataFrame 中的 'sell_prix' 字段没有足够的唯一值来分成三个相同大小的存储桶。结果，第一个和第二个存储桶的端点相同，这就是您收到错误的原因。

Consider

考虑

df = pd.DataFrame([[1,2,3],[1,4,5],[1,5,6],[1,3,4], [2,3,4]], columns = ['a','b','c'])
df
   a  b  c
0  1  2  3
1  1  4  5
2  1  5  6
3  1  3  4
4  2  3  4

pd.qcut(df['a'], 3)

ValueError: Bin edges must be unique: array([ 1.,  1.,  1.,  2.])

try using cut

尝试使用 cut

pd.cut(df['a'], 3)

0    (0.999, 1.333]
1    (0.999, 1.333]
2    (0.999, 1.333]
3    (0.999, 1.333]
4        (1.667, 2]
Name: a, dtype: category
Categories (3, object): [(0.999, 1.333] < (1.333, 1.667] < (1.667, 2]]

Qcut Pandas：ValueError：Bin 边缘必须是唯一的

提问by Arij SEDIRI

回答by luca

回答by Fortunato

相关推荐

最近更新

标签

Qcut Pandas：ValueError：Bin 边缘必须是唯一的

提问by Arij SEDIRI

回答by luca

回答by Fortunato

相关推荐

如何在 Pandas 中的 transpose() 之后删除多余的行（或列）

pandas 如何用熊猫绘制年龄分布

在 Pandas 数据框中使用 for 循环迭代列

Python/Pandas 数据框 - 返回列名

相关推荐

最近更新

标签