Python 使用matplotlib按样本绘制概率密度函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15415455/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:03:48  来源:igfitidea点击:

Plotting probability density function by sample with matplotlib

pythonmatplotlibhistogramprobability

提问by Cupitor

I want to plot an approximation of probability density function based on a sample that I have; The curve that mimics the histogram behaviour. I can have samples as big as I want.

我想根据我拥有的样本绘制概率密度函数的近似值;模仿直方图行为的曲线。我可以拥有任意大的样品。

采纳答案by askewchan

If you want to plot a distribution, and you know it, define it as a function, and plot it as so:

如果您想绘制分布并且您知道它,请将其定义为函数,并按如下方式绘制:

import numpy as np
from matplotlib import pyplot as plt

def my_dist(x):
    return np.exp(-x ** 2)

x = np.arange(-100, 100)
p = my_dist(x)
plt.plot(x, p)
plt.show()


If you don't have the exact distribution as an analytical function, perhaps you can generate a large sample, take a histogram and somehow smooth the data:

如果您没有作为分析函数的精确分布,也许您可​​以生成一个大样本,获取直方图并以某种方式平滑数据:

import numpy as np
from scipy.interpolate import UnivariateSpline
from matplotlib import pyplot as plt

N = 1000
n = N//10
s = np.random.normal(size=N)   # generate your data sample with N elements
p, x = np.histogram(s, bins=n) # bin it into n = N//10 bins
x = x[:-1] + (x[1] - x[0])/2   # convert bin edges to centers
f = UnivariateSpline(x, p, s=n)
plt.plot(x, f(x))
plt.show()

You can increase or decrease s(smoothing factor) within the UnivariateSplinefunction call to increase or decrease smoothing. For example, using the two you get: dist to func

您可以sUnivariateSpline函数调用中增加或减少(平滑因子)以增加或减少平滑。例如,使用你得到的两个: dist 到 func

回答by EnricoGiampieri

What you have to do is to use the gaussian_kde from the scipy.stats.kde package.

您需要做的是使用 scipy.stats.kde 包中的 gaussian_kde。

given your data you can do something like this:

鉴于您的数据,您可以执行以下操作:

from scipy.stats.kde import gaussian_kde
from numpy import linspace
# create fake data
data = randn(1000)
# this create the kernel, given an array it will estimate the probability over that values
kde = gaussian_kde( data )
# these are the values over wich your kernel will be evaluated
dist_space = linspace( min(data), max(data), 100 )
# plot the results
plt.plot( dist_space, kde(dist_space) )

The kernel density can be configured at will and can handle N-dimensional data with ease. It will also avoid the spline distorsion that you can see in the plot given by askewchan.

核密度可以随意配置,可以轻松处理N维数据。它还将避免您在 askewchan 给出的图中看到的样条扭曲。

enter image description here

在此处输入图片说明