pandas Seaborn:具有相对频率的 distplot()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46018032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:23:00  来源:igfitidea点击:

Seaborn: distplot() with relative frequency

pythonpandasmatplotlibdata-visualizationseaborn

提问by Melanie

I am trying to make some histograms in Seaborn for a research project. I would like the y-axis to relative frequency and for the x-axis to run from -180 to 180. Here is the code I have for one of my histograms:

我正在尝试在 Seaborn 中为一个研究项目制作一些直方图。我希望 y 轴与相对频率以及 x 轴从 -180 到 180 之间运行。这是我的直方图之一的代码:

import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns

df = pd.read_csv('sample.csv', index_col=0)

x = df.Angle
sns.distplot(x, kde=False);

This outputs: seaborn frequency plot

这输出: seaborn 频率图

I can't figure out how to convert the output to a frequency instead of a count. I've tried a number of different types of graphs to get frequency output, but to no avail. I have also come across this question which appears to be asking for countplot with frequencies(but with another function.) I've tried using it as a guide but have failed. Any help would be greatly appreciated. I'm very new to this software and to Python as well.

我不知道如何将输出转换为频率而不是计数。我尝试了许多不同类型的图表来获得频率输出,但无济于事。我也遇到过这个问题,它似乎要求使用频率计数图(但使用另一个函数)。我尝试将其用作指南,但失败了。任何帮助将不胜感激。我对这个软件和 Python 都很陌生。

My data looks like the following and can be downloaded: sample data

我的数据如下所示,可以下载: 样本数据

采纳答案by ImportanceOfBeingErnest

Especially as a beginner, try to keep things simple. You have a list of numbers

特别是作为初学者,尽量保持简单。你有一个数字列表

a = [-0.126,1,9,72.3,-44.2489,87.44]

of which you want to create a histogram. In order to define a histogram, you need some bins. So let's say you want to divide the range between -180 and 180 into bins of width 20,

您要创建其中的直方图。为了定义直方图,您需要一些 bin。因此,假设您要将 -180 和 180 之间的范围划分为宽度为 20 的 bin,

import numpy as np
bins = np.arange(-180,181,20)

You can compute the histogram with numpy.histogramwhich returns the counts in the bins.

您可以计算直方图,numpy.histogram用于返回 bin 中的计数。

hist, edges = np.histogram(a, bins)

The relative frequency is the number in each bin divided by the total number of events,

相对频率是每个 bin 中的数量除以事件总数,

freq = hist/float(hist.sum())

The quantity freqis hence the relative frequency which you want to plot as a bar plot

freq因此,数量是您想要绘制为条形图的相对频率

import matplotlib.pyplot as plt
plt.bar(bins[:-1], freq, width=20, align="edge", ec="k" )

This results in the following plot, from which you can read e.g. that 33% of the values lie in the range between 0 and 20.

这导致了下图,您可以从中读取例如 33% 的值位于 0 到 20 之间的范围内。

enter image description here

在此处输入图片说明

Complete code:

完整代码:

import numpy as np
import matplotlib.pyplot as plt

a = [-0.126,1,9,72.3,-44.2489,87.44]

bins = np.arange(-180,181,20)

hist, edges = np.histogram(a, bins)
freq = hist/float(hist.sum())

plt.bar(bins[:-1],freq,width=20, align="edge", ec="k" )

plt.show()

回答by Thomas Matthew

There is a sns.displotargument that allows converting to frequency (or density, as sns refers to it) from count. Its usually False, so you have to enable it with True. In your case:

有一个sns.displot参数允许从计数转换为频率(或密度,如 sns 所指)。它通常为 False,因此您必须使用 True 启用它。在你的情况下:

sns.distplot(x, kde=False, norm_hist=True)

sns.distplot(x, kde=False, norm_hist=True)

Then if you want the x-axis to run from -180 to 180, just use:

然后,如果您希望 x 轴从 -180 运行到 180,只需使用:

plt.xlim(-180,180)

plt.xlim(-180,180)

From the Seaborn Docs:

来自Seaborn 文档

norm_hist : bool, optional

If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.