pandas Seaborn:具有相对频率的 distplot()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46018032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Seaborn: distplot() with relative frequency
提问by Melanie
I am trying to make some histograms in Seaborn for a research project. I would like the y-axis to relative frequency and for the x-axis to run from -180 to 180. Here is the code I have for one of my histograms:
我正在尝试在 Seaborn 中为一个研究项目制作一些直方图。我希望 y 轴与相对频率以及 x 轴从 -180 到 180 之间运行。这是我的直方图之一的代码:
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns
df = pd.read_csv('sample.csv', index_col=0)
x = df.Angle
sns.distplot(x, kde=False);
I can't figure out how to convert the output to a frequency instead of a count. I've tried a number of different types of graphs to get frequency output, but to no avail. I have also come across this question which appears to be asking for countplot with frequencies(but with another function.) I've tried using it as a guide but have failed. Any help would be greatly appreciated. I'm very new to this software and to Python as well.
我不知道如何将输出转换为频率而不是计数。我尝试了许多不同类型的图表来获得频率输出,但无济于事。我也遇到过这个问题,它似乎要求使用频率计数图(但使用另一个函数)。我尝试将其用作指南,但失败了。任何帮助将不胜感激。我对这个软件和 Python 都很陌生。
采纳答案by ImportanceOfBeingErnest
Especially as a beginner, try to keep things simple. You have a list of numbers
特别是作为初学者,尽量保持简单。你有一个数字列表
a = [-0.126,1,9,72.3,-44.2489,87.44]
of which you want to create a histogram. In order to define a histogram, you need some bins. So let's say you want to divide the range between -180 and 180 into bins of width 20,
您要创建其中的直方图。为了定义直方图,您需要一些 bin。因此,假设您要将 -180 和 180 之间的范围划分为宽度为 20 的 bin,
import numpy as np
bins = np.arange(-180,181,20)
You can compute the histogram with numpy.histogram
which returns the counts in the bins.
您可以计算直方图,numpy.histogram
用于返回 bin 中的计数。
hist, edges = np.histogram(a, bins)
The relative frequency is the number in each bin divided by the total number of events,
相对频率是每个 bin 中的数量除以事件总数,
freq = hist/float(hist.sum())
The quantity freq
is hence the relative frequency which you want to plot as a bar plot
freq
因此,数量是您想要绘制为条形图的相对频率
import matplotlib.pyplot as plt
plt.bar(bins[:-1], freq, width=20, align="edge", ec="k" )
This results in the following plot, from which you can read e.g. that 33% of the values lie in the range between 0 and 20.
这导致了下图,您可以从中读取例如 33% 的值位于 0 到 20 之间的范围内。
Complete code:
完整代码:
import numpy as np
import matplotlib.pyplot as plt
a = [-0.126,1,9,72.3,-44.2489,87.44]
bins = np.arange(-180,181,20)
hist, edges = np.histogram(a, bins)
freq = hist/float(hist.sum())
plt.bar(bins[:-1],freq,width=20, align="edge", ec="k" )
plt.show()
回答by Thomas Matthew
There is a sns.displot
argument that allows converting to frequency (or density, as sns refers to it) from count. Its usually False, so you have to enable it with True. In your case:
有一个sns.displot
参数允许从计数转换为频率(或密度,如 sns 所指)。它通常为 False,因此您必须使用 True 启用它。在你的情况下:
sns.distplot(x, kde=False, norm_hist=True)
sns.distplot(x, kde=False, norm_hist=True)
Then if you want the x-axis to run from -180 to 180, just use:
然后,如果您希望 x 轴从 -180 运行到 180,只需使用:
plt.xlim(-180,180)
plt.xlim(-180,180)
From the Seaborn Docs:
来自Seaborn 文档:
norm_hist : bool, optional
If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.