Python seaborn distplot 中的 y 轴是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51666784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is y axis in seaborn distplot?
提问by Mister Twister
I have some geometrically distributed data. When I want to take a look at it, I use
我有一些几何分布的数据。当我想看一看时,我使用
sns.distplot(data, kde=False, norm_hist=True, bins=100)
which results is a picture:
结果是一张图片:
However, bins heights don't add up to 1, which means y axis doesn't show probability, it's something different. If instead we use
但是,bins 高度加起来不等于 1,这意味着 y 轴不显示概率,这是不同的东西。如果我们使用
weights = np.ones_like(np.array(data))/float(len(np.array(data)))
plt.hist(data, weights=weights, bins = 100)
the y axis shall show probability, as bins heights sum up to 1:
y 轴应显示概率,因为 bin 高度总和为 1:
It can be seen more clearly here: suppose we have a list
在这里可以更清楚地看到:假设我们有一个列表
l = [1, 3, 2, 1, 3]
We have two 1s, two 3s and one 2, so their respective probabilities are 2/5, 2/5 and 1/5. When we use seaborn histplot with 3 bins:
我们有两个 1、两个 3 和一个 2,所以它们各自的概率是 2/5、2/5 和 1/5。当我们使用带有 3 个 bin 的 seaborn histplot 时:
sns.distplot(l, kde=False, norm_hist=True, bins=3)
we get:
我们得到:
As you can see, the 1st and the 3rd bin sum up to 0.6+0.6=1.2 which is already greater than 1, so y axis is not a probability. When we use
如您所见,第 1 个和第 3 个 bin 总和为 0.6+0.6=1.2,这已经大于 1,因此 y 轴不是概率。当我们使用
weights = np.ones_like(np.array(l))/float(len(np.array(l)))
plt.hist(l, weights=weights, bins = 3)
we get:
我们得到:
and the y axis is probability, as 0.4+0.4+0.2=1 as expected.
y 轴是概率,如预期的 0.4+0.4+0.2=1。
The amount of bins in these 2 cases are is the same for both methods used in each case: 100 bins for geometrically distributed data, 3 bins for small array l with 3 possible values. So bins amount is not the issue.
这两种情况下的 bin 数量对于每种情况下使用的两种方法都是相同的:100 个 bin 用于几何分布的数据,3 个 bin 用于具有 3 个可能值的小数组 l。所以垃圾箱数量不是问题。
My question is:in seaborn distplot called with norm_hist=True, what is the meaning of y axis?
我的问题是:在用 norm_hist=True 调用的 seaborn distplot 中,y 轴的含义是什么?
采纳答案by IonicSolutions
From the documentation:
从文档:
norm_hist: bool, optional
If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.
norm_hist: 布尔值,可选
如果为 True,则直方图高度显示密度而不是计数。如果绘制了 KDE 或拟合密度,则暗示了这一点。
So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.
因此,您还需要考虑您的 bin 宽度,即计算曲线下的面积,而不仅仅是 bin 高度的总和。
回答by Prasann
The x-axis is the value of the variable just like in a histogram, but what exactly does the y-axis represent?
x 轴是变量的值,就像在直方图中一样,但 y 轴到底代表什么?
ANS->The y-axis in a density plot is the probability density function for the kernel density estimation. However, we need to be careful to specify this is a probability density and not a probability. The difference is the probability density is the probability per unit on the x-axis. To convert to an actual probability, we need to find the area under the curve for a specific interval on the x-axis. Somewhat confusingly, because this is a probability density and not a probability, the y-axis can take values greater than one. The only requirement of the density plot is that the total area under the curve integrates to one. I generally tend to think of the y-axis on a density plot as a value only for relative comparisons between different categories.
ANS->密度图中的 y 轴是核密度估计的概率密度函数。但是,我们需要小心地指定这是概率密度而不是概率。区别在于概率密度是 x 轴上每单位的概率。要转换为实际概率,我们需要找到 x 轴上特定区间的曲线下面积。有点令人困惑,因为这是概率密度而不是概率,y 轴可以取大于 1 的值。密度图的唯一要求是曲线下的总面积积分为 1。我通常倾向于将密度图上的 y 轴视为仅用于不同类别之间相对比较的值。
from the reference of https://towardsdatascience.com/histograms-and-density-plots-in-python-f6bda88f5ac0
来自https://towardsdatascience.com/histograms-and-density-plots-in-python-f6bda88f5ac0的参考