绘制 95% 置信区间误差条 python pandas dataframes
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44603615/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plot 95% confidence interval errorbar python pandas dataframes
提问by MaxNoe
I want to show 95% confidence interval with Python pandas, matpolib...
But I stucked, because for usual .std()
I would do smth like this:
我想用 Python 熊猫、matpolib 显示 95% 的置信区间……但我坚持了下来,因为通常.std()
我会这样做:
import pandas as pd
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import math
data = pd.read_table('output.txt',sep=r'\,', engine='python')
Ox = data.groupby(['Ox'])['Ox'].mean()
Oy = data.groupby(['Ox'])['Oy'].mean()
std = data.groupby(['Ox'])['Oy'].std()
plt.plot(Ox, Oy , label = 'STA = '+ str(x))
plt.errorbar(Ox, Oy, std, label = 'errorbar', linewidth=2)
plt.legend(loc='best', prop={'size':9.2})
plt.savefig('plot.pdf')
plt.close()
But I haven't found something in pandas methods which can help me. Does anybody know?
但是我还没有在 Pandas 方法中找到可以帮助我的东西。有人知道吗?
回答by MaxNoe
Using 2 * std to estimate the 95 % interval
使用 2 * std 估计 95% 间隔
In a normal distribution, the interval [μ - 2σ, μ + 2σ] covers 95.5 %, so you can use 2 * std to estimate the 95 % interval:
在正态分布中,区间 [μ - 2σ, μ + 2σ] 覆盖了 95.5 %,因此您可以使用 2 * std 来估计 95 % 的区间:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame()
df['category'] = np.random.choice(np.arange(10), 1000, replace=True)
df['number'] = np.random.normal(df['category'], 1)
mean = df.groupby('category')['number'].mean()
std = df.groupby('category')['number'].std()
plt.errorbar(mean.index, mean, xerr=0.5, yerr=2*std, linestyle='')
plt.show()
Result:
结果:
Using percentiles
使用百分位数
If your distribution is skewed, it is better to use asymmetrical errorbars and get your 95% interval from the percentiles.
如果您的分布偏斜,最好使用不对称误差条并从百分位数中获得 95% 的区间。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import skewnorm
df = pd.DataFrame()
df['category'] = np.random.choice(np.arange(10), 1000, replace=True)
df['number'] = skewnorm.rvs(5, df['category'], 1)
mean = df.groupby('category')['number'].mean()
p025 = df.groupby('category')['number'].quantile(0.025)
p975 = df.groupby('category')['number'].quantile(0.975)
plt.errorbar(
mean.index,
mean,
xerr=0.5,
yerr=[mean - p025, p975 - mean],
linestyle='',
)
plt.show()
Result:
结果:
回答by ImportanceOfBeingErnest
For a normal distribution ~95% of the values lie within a window of 4 standard deviations around the mean, or in other words, 95% of the values are within plus/minus 2 standard deviations from the mean. See, e.g. 68–95–99.7-rule.
对于正态分布,约 95% 的值位于均值周围 4 个标准差的窗口内,或者换句话说,95% 的值在均值的正负 2 个标准差范围内。参见,例如68-95-99.7 规则。
plt.errorbar
's yerr
argument specifies the length of the single sided errorbar. Thus taking
plt.errorbar
的 yerr
参数指定单边误差条的长度。因此采取
plt.errorbar(x,y,yerr=2*std)
where std
is the standard deviation shows the errorbars of the 95% confidence interval.
其中std
是标准偏差显示 95% 置信区间的误差条。