pandas 如何绘制样品的 PMF？

Question

提问by Milena Araujo

Is there any function or library that would help me to plot a probability mass function of a sample the same way there is for plotting the probability density function of a sample ?

是否有任何函数或库可以帮助我以与绘制样本的概率密度函数相同的方式绘制样本的概率质量函数？

For instance, using pandas, plotting a PDF is as simple as calling:

例如，使用 Pandas，绘制 PDF 就像调用一样简单：

sample.plot(kind="density")

If there is no easy way, how can I compute the PMF so I could plot using matplotlib ?

如果没有简单的方法，我如何计算 PMF 以便我可以使用 matplotlib 进行绘图？

Answer 1

回答by behzad.nouri

If tsis a series, you may obtain PMF of the sample by:

如果ts是系列，您可以通过以下方式获得样品的 PMF：

>>> pmf = ts.value_counts().sort_index() / len(ts)

and plot it by:

并通过以下方式绘制：

>>> pmf.plot(kind='bar')

numpy only solution can be done using np.unique:

只能使用 numpy 解决方案np.unique：

>>> xs = np.random.randint(0, 10, 100)
>>> xs
array([5, 2, 2, 1, 2, 8, 6, 7, 5, 3, 2, 6, 4, 9, 7, 6, 4, 7, 6, 8, 7, 0, 6,
       2, 9, 8, 7, 7, 2, 6, 2, 8, 0, 2, 5, 1, 3, 6, 7, 7, 2, 2, 0, 3, 8, 7,
       4, 0, 5, 7, 5, 4, 4, 9, 5, 1, 6, 6, 0, 9, 4, 2, 0, 8, 7, 5, 1, 1, 2,
       8, 3, 8, 9, 0, 0, 6, 8, 7, 2, 6, 7, 9, 7, 8, 8, 3, 3, 7, 8, 2, 2, 4,
       4, 5, 3, 4, 1, 5, 5, 1])

>>> val, cnt = np.unique(xs, return_counts=True)
>>> pmf = cnt / len(xs)

>>> # values along with probability mass function
>>> np.column_stack((val, pmf))
array([[ 0.  ,  0.08],
       [ 1.  ,  0.07],
       [ 2.  ,  0.15],
       [ 3.  ,  0.07],
       [ 4.  ,  0.09],
       [ 5.  ,  0.1 ],
       [ 6.  ,  0.11],
       [ 7.  ,  0.15],
       [ 8.  ,  0.12],
       [ 9.  ,  0.06]])

Answer 2

回答by Emsi

You may use np.histogramto compute PMF using density=trueprovided that bins of unity width are used(otherwise you'll get the value of the probability density function at the bin which is most probably not what you need).

如果使用统一宽度的 bin，您可以使用np.histogram计算 PMF （否则您将在 bin 处获得概率密度函数的值，这很可能不是您需要的）。density=true

>>> xs = np.array(
          [5, 2, 2, 1, 2, 8, 6, 7, 5, 3, 2, 6, 4, 9, 7, 6, 4, 7, 6, 8, 7, 0, 6,
           2, 9, 8, 7, 7, 2, 6, 2, 8, 0, 2, 5, 1, 3, 6, 7, 7, 2, 2, 0, 3, 8, 7,
           4, 0, 5, 7, 5, 4, 4, 9, 5, 1, 6, 6, 0, 9, 4, 2, 0, 8, 7, 5, 1, 1, 2,
           8, 3, 8, 9, 0, 0, 6, 8, 7, 2, 6, 7, 9, 7, 8, 8, 3, 3, 7, 8, 2, 2, 4,
           4, 5, 3, 4, 1, 5, 5, 1])

>>> pmf, bins = np.histogram(xs, bins=range(0,11), density=True)
>>> np.column_stack((bins[:-1], pmf))
array([[ 0.  ,  0.08],
       [ 1.  ,  0.07],
       [ 2.  ,  0.15],
       [ 3.  ,  0.07],
       [ 4.  ,  0.09],
       [ 5.  ,  0.1 ],
       [ 6.  ,  0.11],
       [ 7.  ,  0.15],
       [ 8.  ,  0.12],
       [ 9.  ,  0.06]])

Answer 3

回答by Aeden

Given a Pandas Dataframe, df, using seaborn you can write

给定一个 Pandas Dataframe，df使用 seaborn 你可以写

import seaborn as sns

probabilities = df['SomeColumn'].value_counts(normalize=True)    
sns.barplot(probabilities.index, probabilities.values)

pandas 如何绘制样品的 PMF？

提问by Milena Araujo

回答by behzad.nouri

回答by Emsi

回答by Aeden

相关推荐

最近更新

标签

pandas 如何绘制样品的 PMF？

提问by Milena Araujo

回答by behzad.nouri

回答by Emsi

回答by Aeden

相关推荐

pandas 使用pandas读取下载的html文件

pandas Python解析JavaScript生成的HTML表格

.gz 文件到带有 hive 分隔符的 Pandas DataFrame

将 Pandas 数据框的全部内容写入 HTML 表格

相关推荐

最近更新

标签