pandas 如何绘制样品的 PMF?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25273415/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to plot a PMF of a sample?
提问by Milena Araujo
Is there any function or library that would help me to plot a probability mass function of a sample the same way there is for plotting the probability density function of a sample ?
是否有任何函数或库可以帮助我以与绘制样本的概率密度函数相同的方式绘制样本的概率质量函数?
For instance, using pandas, plotting a PDF is as simple as calling:
例如,使用 Pandas,绘制 PDF 就像调用一样简单:
sample.plot(kind="density")
If there is no easy way, how can I compute the PMF so I could plot using matplotlib ?
如果没有简单的方法,我如何计算 PMF 以便我可以使用 matplotlib 进行绘图?
回答by behzad.nouri
If tsis a series, you may obtain PMF of the sample by:
如果ts是系列,您可以通过以下方式获得样品的 PMF:
>>> pmf = ts.value_counts().sort_index() / len(ts)
and plot it by:
并通过以下方式绘制:
>>> pmf.plot(kind='bar')
numpy only solution can be done using np.unique:
只能使用 numpy 解决方案np.unique:
>>> xs = np.random.randint(0, 10, 100)
>>> xs
array([5, 2, 2, 1, 2, 8, 6, 7, 5, 3, 2, 6, 4, 9, 7, 6, 4, 7, 6, 8, 7, 0, 6,
2, 9, 8, 7, 7, 2, 6, 2, 8, 0, 2, 5, 1, 3, 6, 7, 7, 2, 2, 0, 3, 8, 7,
4, 0, 5, 7, 5, 4, 4, 9, 5, 1, 6, 6, 0, 9, 4, 2, 0, 8, 7, 5, 1, 1, 2,
8, 3, 8, 9, 0, 0, 6, 8, 7, 2, 6, 7, 9, 7, 8, 8, 3, 3, 7, 8, 2, 2, 4,
4, 5, 3, 4, 1, 5, 5, 1])
>>> val, cnt = np.unique(xs, return_counts=True)
>>> pmf = cnt / len(xs)
>>> # values along with probability mass function
>>> np.column_stack((val, pmf))
array([[ 0. , 0.08],
[ 1. , 0.07],
[ 2. , 0.15],
[ 3. , 0.07],
[ 4. , 0.09],
[ 5. , 0.1 ],
[ 6. , 0.11],
[ 7. , 0.15],
[ 8. , 0.12],
[ 9. , 0.06]])
回答by Emsi
You may use np.histogramto compute PMF using density=trueprovided that bins of unity width are used(otherwise you'll get the value of the probability density function at the bin which is most probably not what you need).
如果使用统一宽度的 bin,您可以使用np.histogram计算 PMF (否则您将在 bin 处获得概率密度函数的值,这很可能不是您需要的)。density=true
>>> xs = np.array(
[5, 2, 2, 1, 2, 8, 6, 7, 5, 3, 2, 6, 4, 9, 7, 6, 4, 7, 6, 8, 7, 0, 6,
2, 9, 8, 7, 7, 2, 6, 2, 8, 0, 2, 5, 1, 3, 6, 7, 7, 2, 2, 0, 3, 8, 7,
4, 0, 5, 7, 5, 4, 4, 9, 5, 1, 6, 6, 0, 9, 4, 2, 0, 8, 7, 5, 1, 1, 2,
8, 3, 8, 9, 0, 0, 6, 8, 7, 2, 6, 7, 9, 7, 8, 8, 3, 3, 7, 8, 2, 2, 4,
4, 5, 3, 4, 1, 5, 5, 1])
>>> pmf, bins = np.histogram(xs, bins=range(0,11), density=True)
>>> np.column_stack((bins[:-1], pmf))
array([[ 0. , 0.08],
[ 1. , 0.07],
[ 2. , 0.15],
[ 3. , 0.07],
[ 4. , 0.09],
[ 5. , 0.1 ],
[ 6. , 0.11],
[ 7. , 0.15],
[ 8. , 0.12],
[ 9. , 0.06]])
回答by Aeden
Given a Pandas Dataframe, df, using seaborn you can write
给定一个 Pandas Dataframe,df使用 seaborn 你可以写
import seaborn as sns
probabilities = df['SomeColumn'].value_counts(normalize=True)
sns.barplot(probabilities.index, probabilities.values)

