pandas 如何在 matplotlib 中绘制置信区间?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39008436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:50:37  来源:igfitidea点击:

How to plot confidence interval in matplotlib?

pandasmatplotlibscipy

提问by Nate Reed

I'm using matplotlib to plot the distribution of a data set, and want to overlay vertical lines for the confidence interval.

我正在使用 matplotlib 绘制数据集的分布,并希望为置信区间叠加垂直线。

The density plot looks fine, but I don't see the line. Any ideas?

密度图看起来不错,但我没有看到这条线。有任何想法吗?

# Get data
import urllib.request as request
request.urlretrieve('http://seanlahman.com/files/database/baseballdatabank-master_2016-03-02.zip', "baseballdatabank-master_2016-03-02.zip")
from zipfile import ZipFile
zip = ZipFile('baseballdatabank-master_2016-03-02.zip')
zip.extractall()

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

batting_df = pd.read_csv("baseballdatabank-master\core\Batting.csv")
batting_df = batting_df[batting_df['AB'] > 20]
batting_df['batting_average'] = batting_df['H'] / batting_df['AB']

# Plot distribution
batting_averages = batting_df['batting_average'].dropna()
batting_averages.plot.kde()

# Plot confidence interval
import scipy.stats
import numpy as np, scipy.stats as st
stderr = st.sem(batting_averages)
interval1 = (batting_averages.mean() - stderr * 1.96, batting_averages.mean() + stderr * 1.96)
plt.plot(interval1[0], 12)
plt.show()

I'm trying to plot the vertical line at the x coordinate of the first interval, which is centered around the mean. I passed 12 as the y coordinate as this is highest value shown on the y axis.

我试图在第一个区间的 x 坐标处绘制垂直线,该坐标以均值为中心。我通过 12 作为 y 坐标,因为这是 y 轴上显示的最高值。

回答by Luis

If you catch the axes of the kdeplot like this:

如果您kde像这样捕捉绘图的轴:

ax = batting_averages.plot.kde()

... then you can plot vertical lines at any position you want:

...然后你可以在你想要的任何位置绘制垂直线:

stderr = st.sem(batting_averages)
ax.vlines( x=batting_averages.mean(), ymin=-1, ymax=15, color='red', label='mean' )
stderr = 0.1
ax.vlines( x=batting_averages.mean() - stderr * 1.96, ymin=-1, ymax=15, color='green', label='95% CI' )
ax.vlines( x=batting_averages.mean() + stderr * 1.96, ymin=-1, ymax=15, color='green' )
ax.set_ylim([-1,12])
ax.legend()
plt.show()

which gives you the following graph:

它为您提供以下图表:

enter image description here

在此处输入图片说明

(note that I changed the standard error to make the lines visible)

(请注意,我更改了标准错误以使线条可见)

回答by Nate Reed

Plot takes two arguments, x and y. In this case, I need to pass the x coordinates of the two points that define the line, followed by the y coordinates of the two points:

Plot 有两个参数,x 和 y。在这种情况下,我需要传递定义线的两个点的 x 坐标,然后是这两个点的 y 坐标:

plot((x1, x2), (y1, y2))

Substituting the variables from the example above:

替换上例中的变量:

plt.plot((interval1[0], interval1[0]), (0, 12))
plt.plot((interval1[1], interval1[1]), (0, 12))

See: vertical & horizontal lines in matplotlib

请参阅:matplotlib 中的垂直和水平线