pandas seaborn:选择的 KDE 带宽为 0。无法估计密度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/60596102/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:25:45  来源:igfitidea点击:

seaborn: Selected KDE bandwidth is 0. Cannot estimate density

pythonpandasdata-visualizationseabornkernel-density

提问by SaadH

import pandas as pd
import seaborn as sns

ser_test = pd.Series([1,0,1,4,6,0,6,5,1,3,2,5,1])
sns.kdeplot(ser_test, cumulative=True)

The above code generates the following CDF graph:

上面的代码生成以下 CDF 图:

CDF of series (ser_test)

系列的 CDF (ser_test)

But when the elements of the series are modified to:

但是当系列的元素被修改为:

ser_test = pd.Series([1,0,1,1,6,0,6,1,1,0,2,1,1])
sns.kdeplot(ser_test, cumulative=True)

I get the following error:

我收到以下错误:

ValueError: could not convert string to float: 'scott'

RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.

ValueError: 无法将字符串转换为浮点数:'scott'

运行时错误:选定的 KDE 带宽为 0。无法估计密度。

What does this error mean and how can I resolve it to generate a CDF (even if it is very skewed).

这个错误是什么意思,我如何解决它以生成 CDF(即使它非常倾斜)。

Edit:I am using seaborn version 0.9.0

编辑:我使用的是 seaborn 0.9.0 版

The complete trace is below:

完整的跟踪如下:

ValueError: could not convert string to float: 'scott'

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<ipython-input-93-7cee594b4526> in <module>
      1 ser_test = pd.Series([1,0,1,1,6,0,6,1,1,0,2,1,1])
----> 2 sns.kdeplot(ser_test, cumulative=True)

~/.local/lib/python3.5/site-packages/seaborn/distributions.py in kdeplot(data, data2, shade, vertical, kernel, bw, gridsize, cut, clip, legend, cumulative, shade_lowest, cbar, cbar_ax, cbar_kws, ax, **kwargs)
    689         ax = _univariate_kdeplot(data, shade, vertical, kernel, bw,
    690                                  gridsize, cut, clip, legend, ax,
--> 691                                  cumulative=cumulative, **kwargs)
    692 
    693     return ax

~/.local/lib/python3.5/site-packages/seaborn/distributions.py in _univariate_kdeplot(data, shade, vertical, kernel, bw, gridsize, cut, clip, legend, ax, cumulative, **kwargs)
    281         x, y = _statsmodels_univariate_kde(data, kernel, bw,
    282                                            gridsize, cut, clip,
--> 283                                            cumulative=cumulative)
    284     else:
    285         # Fall back to scipy if missing statsmodels

~/.local/lib/python3.5/site-packages/seaborn/distributions.py in _statsmodels_univariate_kde(data, kernel, bw, gridsize, cut, clip, cumulative)
    353     fft = kernel == "gau"
    354     kde = smnp.KDEUnivariate(data)
--> 355     kde.fit(kernel, bw, fft, gridsize=gridsize, cut=cut, clip=clip)
    356     if cumulative:
    357         grid, y = kde.support, kde.cdf

~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kde.py in fit(self, kernel, bw, fft, weights, gridsize, adjust, cut, clip)
    138             density, grid, bw = kdensityfft(endog, kernel=kernel, bw=bw,
    139                     adjust=adjust, weights=weights, gridsize=gridsize,
--> 140                     clip=clip, cut=cut)
    141         else:
    142             density, grid, bw = kdensity(endog, kernel=kernel, bw=bw,

~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kde.py in kdensityfft(X, kernel, bw, weights, gridsize, adjust, clip, cut, retgrid)
    451         bw = float(bw)
    452     except:
--> 453         bw = bandwidths.select_bandwidth(X, bw, kern) # will cross-val fit this pattern?
    454     bw *= adjust
    455 

~/.local/lib/python3.5/site-packages/statsmodels/nonparametric/bandwidths.py in select_bandwidth(x, bw, kernel)
    172         # eventually this can fall back on another selection criterion.
    173         err = "Selected KDE bandwidth is 0. Cannot estimate density."
--> 174         raise RuntimeError(err)
    175     else:
    176         return bandwidth

RuntimeError: Selected KDE bandwidth is 0. Cannot estimate density.

回答by Josh Friedlander

What's going on here is that Seaborn (or rather, the library it relies on to calculate the KDE - scipy or statsmodels) isn't managing to figure out the "bandwidth", a scaling parameter used in the calculation. You can pass it manually. I played with a few values and found 1.5 gave a graph at the same scale as your previous:

这里发生的事情是 Seaborn(或者更确切地说,它依赖于计算 KDE-scipy 或 statsmodels 的库)没有设法找出“带宽”,这是计算中使用缩放参数。您可以手动传递它。我玩了几个值,发现 1.5 给出了一个与你之前相同比例的图表:

sns.kdeplot(ser_test, cumulative=True, bw=1.5)

See also here. Worth installing statsmodelsif you don't have it.

另请参见此处statsmodels如果你没有它,值得安装。

回答by Jakub Maly

pip uninstall statsmodelssolved a similar problem with the same error.

pip uninstall statsmodels用同样的错误解决了类似的问题。

回答by user108569

if you don't want to wait for the seaborn git update to get released in a stable version, you can try one of the solutions in the issue page. specifically henrymartin1's suggestion to try manually passing in a small bandwidth inside a try/catch block (suggested by ahartikainen) which grabs the text of this specific error (so other errors still get raised):

如果您不想等待 seaborn git 更新以稳定版本发布,您可以尝试问题页面中的解决方案之一。特别是 henrymartin1 的建议,即尝试在 try/catch 块(由 ahartikainen 建议)中手动传入一个小带宽,该块获取此特定错误的文本(因此仍会引发其他错误):

try:
    sns.distplot(df)
except RuntimeError as re:
    if str(re).startswith("Selected KDE bandwidth is 0. Cannot estimate density."):
        sns.distplot(df, kde_kws={'bw': 0.1})
    else:
        raise re

This worked for me.

这对我有用。