pandas seaborn：选择的 KDE 带宽为 0。无法估计密度

Question

提问by SaadH

import pandas as pd
import seaborn as sns

ser_test = pd.Series([1,0,1,4,6,0,6,5,1,3,2,5,1])
sns.kdeplot(ser_test, cumulative=True)

The above code generates the following CDF graph:

上面的代码生成以下 CDF 图：

But when the elements of the series are modified to:

但是当系列的元素被修改为：

ser_test = pd.Series([1,0,1,1,6,0,6,1,1,0,2,1,1])
sns.kdeplot(ser_test, cumulative=True)

I get the following error:

我收到以下错误：

ValueError: could not convert string to float: 'scott'
RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.

ValueError: 无法将字符串转换为浮点数：'scott'
运行时错误：选定的 KDE 带宽为 0。无法估计密度。

What does this error mean and how can I resolve it to generate a CDF (even if it is very skewed).

这个错误是什么意思，我如何解决它以生成 CDF（即使它非常倾斜）。

Edit:I am using seaborn version 0.9.0

编辑：我使用的是 seaborn 0.9.0 版

The complete trace is below:

完整的跟踪如下：

ValueError: could not convert string to float: 'scott'

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-93-7cee594b4526> in <module>
      1 ser_test = pd.Series([1,0,1,1,6,0,6,1,1,0,2,1,1])
----> 2 sns.kdeplot(ser_test, cumulative=True)

~/.local/lib/python3.5/site-packages/seaborn/distributions.py in kdeplot(data, data2, shade, vertical, kernel, bw, gridsize, cut, clip, legend, cumulative, shade_lowest, cbar, cbar_ax, cbar_kws, ax, **kwargs)
    689         ax = _univariate_kdeplot(data, shade, vertical, kernel, bw,
    690                                  gridsize, cut, clip, legend, ax,
--> 691                                  cumulative=cumulative, **kwargs)
    692 
    693     return ax

~/.local/lib/python3.5/site-packages/seaborn/distributions.py in _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative, **kwargs)
    281         x, y = _statsmodels_univariate_kde(data, kernel, bw,
    282                                            gridsize, cut, clip,
--> 283                                            cumulative=cumulative)
    284     else:
    285         # Fall back to scipy if missing statsmodels

~/.local/lib/python3.5/site-packages/seaborn/distributions.py in _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative)
    353     fft = kernel == "gau"
    354     kde = smnp.KDEUnivariate(data)
--> 355     kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip)
    356     if cumulative:
    357         grid, y = kde.support, kde.cdf

~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kde.py in fit(self, kernel, bw, fft, weights, gridsize, adjust, cut, clip)
    138             density, grid, bw = kdensityfft(endog, kernel=kernel, bw=bw,
    139                     adjust=adjust, weights=weights, gridsize=gridsize,
--> 140                     clip=clip, cut=cut)
    141         else:
    142             density, grid, bw = kdensity(endog, kernel=kernel, bw=bw,

~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid)
    451         bw = float(bw)
    452     except:
--> 453         bw = bandwidths.select_bandwidth(X, bw, kern) # will cross-val fit this pattern?
    454     bw *= adjust
    455 

~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/bandwidths.py in select_bandwidth(x, bw, kernel)
    172         # eventually this can fall back on another selection criterion.
    173         err = "Selected KDE bandwidth is 0. Cannot estimate density."
--> 174         raise RuntimeError(err)
    175     else:
    176         return bandwidth

RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.

Answer 1

回答by Josh Friedlander

What's going on here is that Seaborn (or rather, the library it relies on to calculate the KDE - scipy or statsmodels) isn't managing to figure out the "bandwidth", a scaling parameter used in the calculation. You can pass it manually. I played with a few values and found 1.5 gave a graph at the same scale as your previous:

这里发生的事情是 Seaborn（或者更确切地说，它依赖于计算 KDE-scipy 或 statsmodels 的库）没有设法找出“带宽”，这是计算中使用的缩放参数。您可以手动传递它。我玩了几个值，发现 1.5 给出了一个与你之前相同比例的图表：

sns.kdeplot(ser_test, cumulative=True, bw=1.5)

See also here. Worth installing statsmodelsif you don't have it.

另请参见此处。statsmodels如果你没有它，值得安装。

Answer 2

回答by Jakub Maly

pip uninstall statsmodelssolved a similar problem with the same error.

pip uninstall statsmodels用同样的错误解决了类似的问题。

Answer 3

回答by user108569

if you don't want to wait for the seaborn git update to get released in a stable version, you can try one of the solutions in the issue page. specifically henrymartin1's suggestion to try manually passing in a small bandwidth inside a try/catch block (suggested by ahartikainen) which grabs the text of this specific error (so other errors still get raised):

如果您不想等待 seaborn git 更新以稳定版本发布，您可以尝试问题页面中的解决方案之一。特别是 henrymartin1 的建议，即尝试在 try/catch 块（由 ahartikainen 建议）中手动传入一个小带宽，该块获取此特定错误的文本（因此仍会引发其他错误）：

try:
    sns.distplot(df)
except RuntimeError as re:
    if str(re).startswith("Selected KDE bandwidth is 0. Cannot estimate density."):
        sns.distplot(df, kde_kws={'bw': 0.1})
    else:
        raise re

This worked for me.

这对我有用。

pandas seaborn：选择的 KDE 带宽为 0。无法估计密度

提问by SaadH

回答by Josh Friedlander

回答by Jakub Maly

回答by user108569

相关推荐

最近更新

标签

pandas seaborn：选择的 KDE 带宽为 0。无法估计密度

提问by SaadH

回答by Josh Friedlander

回答by Jakub Maly

回答by user108569

相关推荐

Pandas-ValueError：Usecols 与列不匹配，列需要但未找到

pandas XGBoost: AttributeError: 'DataFrame' 对象没有属性 'feature_names'

合并多个 DataFrames Pandas

从 Pandas 的列标题中删除前缀（或后缀）子字符串

相关推荐

最近更新

标签