pandas 从熊猫数据框中按名称绘制正态分布图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41768629/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:50:00  来源:igfitidea点击:

Normal Distribution Plot by name from pandas dataframe

pythonpandasmatplotlibplotseaborn

提问by johnnyb

I have a dataframe like below:

我有一个如下所示的数据框:

dateTime        Name    DateTime        day seconds zscore
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 15:17 james   11/1/2016 15:17 Tue 55020   1.158266091
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:41 james   11/1/2016 13:41 Tue 49260   -0.836236954
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 13:42 james   11/1/2016 13:42 Tue 49320   -0.81546088
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:07  matt    11/1/2016 9:07  Tue 32820   -0.223746683
11/1/2016 9:08  matt    11/1/2016 9:08  Tue 32880   -0.111873342
11/1/2016 9:48  matt    11/1/2016 9:48  Tue 35280   4.363060322

zscore is calculated as below:

zscore 计算如下:

grp2 = df.groupby(['Name'])['seconds']
df['zscore'] = grp2.transform(lambda x: (x - x.mean()) / x.std(ddof=1))

I would like to plot my data in a bell curve / normal distribution plot and save this as a picture/pdf file for each Name in my dataframe.

我想在钟形曲线/正态分布图中绘制我的数据,并将其保存为数据框中每个名称的图片/pdf文件。

I have tried to plot the zscores like below:

我试图绘制如下 zscores:

df['by_name'].plot(kind='hist', normed=True)
range = np.arange(-7, 7, 0.001)
plt.plot(range, norm.pdf(range,0,1))
plt.show()

How would I go about plotting the by_name zscores column for each name in my data?

我将如何为数据中的每个名称绘制 by_name zscores 列?

回答by piRSquared

np.random.seed([3,1415])
df = pd.DataFrame(dict(
        Name='matt joe adam farley'.split() * 100,
        Seconds=np.random.randint(4000, 5000, 400)
    ))

df['Zscore'] = df.groupby('Name').Seconds.apply(lambda x: x.div(x.mean()))

df.groupby('Name').Zscore.plot.kde()

enter image description here

在此处输入图片说明



split out plots

分割地块

g = df.groupby('Name').Zscore
n = g.ngroups
fig, axes = plt.subplots(n // 2, 2, figsize=(6, 6), sharex=True, sharey=True)
for i, (name, group) in enumerate(g):
    r, c = i // 2, i % 2
    group.plot.kde(title=name, ax=axes[r, c])
fig.tight_layout()

enter image description here

在此处输入图片说明



kde+ hist

kde+ hist

g = df.groupby('Name').Zscore
n = g.ngroups
fig, axes = plt.subplots(n // 2, 2, figsize=(6, 6), sharex=True, sharey=True)
for i, (name, group) in enumerate(g):
    r, c = i // 2, i % 2
    a1 = axes[r, c]
    a2 = a1.twinx()
    group.plot.hist(ax=a2, alpha=.3)
    group.plot.kde(title=name, ax=a1, c='r')
fig.tight_layout()

enter image description here

在此处输入图片说明