从 Pandas 数据框覆盖箱线图上的实际数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23036317/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:55:32  来源:igfitidea点击:

Overlaying actual data on a boxplot from a pandas dataframe

matplotlibpandasdataframeboxplotseaborn

提问by geog_newbie

I am using Seaborn to make boxplots from pandas dataframes. Seabornboxplots seem to essentially read the dataframes the same way as the pandasboxplotfunctionality (so I hope the solution is the same for both -- but I can just use the dataframe.boxplotfunction as well). My dataframe has 12 columns and the following code generates a single plot with one boxplot for each column (just like the dataframe.boxplot()function would).

我正在使用 Seaborn 从 Pandas 数据帧制作箱线图。Seabornboxplots 似乎基本上以与pandasboxplot功能相同的方式读取数据帧(所以我希望解决方案对两者都是相同的——但我也可以使用该dataframe.boxplot函数)。我的数据框有 12 列,以下代码生成一个图,每列一个箱线图(就像dataframe.boxplot()函数一样)。

fig, ax = plt.subplots()
sns.set_style("darkgrid", {"axes.facecolor":"darkgrey"})
pal = sns.color_palette("husl",12)
sns.boxplot(dataframe, color = pal)

Can anyone suggest a simple way of overlaying all the values (by columns) while making a boxplot from dataframes? I will appreciate any help with this.

任何人都可以提出一种在从数据帧制作箱线图时覆盖所有值(按列)的简单方法吗?我将不胜感激。

采纳答案by CT Zhu

A general solution for the boxplot for the entire dataframe, which should work for both seabornand pandasas their are all matplotlibbased under the hood, I will use pandasplot as the example, assuming import matplotlib.pyplot as pltalready in place. As you have already have the ax, it would make better sense to just use ax.text(...)instead of plt.text(...).

整个数据框的箱线图的通用解决方案,它应该适用于两者seabornpandas并且它们都matplotlib基于引擎盖,我将使用pandasplot 作为示例,假设import matplotlib.pyplot as plt已经到位。由于您已经拥有ax,因此只使用ax.text(...)代替会更有意义plt.text(...)

In [35]:    
print df
         V1        V2        V3        V4        V5
0  0.895739  0.850580  0.307908  0.917853  0.047017
1  0.931968  0.284934  0.335696  0.153758  0.898149
2  0.405657  0.472525  0.958116  0.859716  0.067340
3  0.843003  0.224331  0.301219  0.000170  0.229840
4  0.634489  0.905062  0.857495  0.246697  0.983037
5  0.573692  0.951600  0.023633  0.292816  0.243963

[6 rows x 5 columns]

In [34]:    
df.boxplot()
for x, y, s in zip(np.repeat(np.arange(df.shape[1])+1, df.shape[0]), 
                   df.values.ravel(), df.values.astype('|S5').ravel()):
    plt.text(x,y,s,ha='center',va='center')

enter image description here

在此处输入图片说明

For a single series in the dataframe, a few small changes is necessary:

对于数据框中的单个系列,需要进行一些小的更改:

In [35]:    
sub_df=df.V1
pd.DataFrame(sub_df).boxplot()
for x, y, s in zip(np.repeat(1, df.shape[0]), 
                   sub_df.ravel(), sub_df.values.astype('|S5').ravel()):
    plt.text(x,y,s,ha='center',va='center')

enter image description here

在此处输入图片说明

Making scatter plots is also similar:

制作散点图也类似:

#for the whole thing
df.boxplot()
plt.scatter(np.repeat(np.arange(df.shape[1])+1, df.shape[0]), df.values.ravel(), marker='+', alpha=0.5)
#for just one column
sub_df=df.V1
pd.DataFrame(sub_df).boxplot()
plt.scatter(np.repeat(1, df.shape[0]), sub_df.ravel(), marker='+', alpha=0.5)

enter image description hereenter image description here

在此处输入图片说明在此处输入图片说明

To overlay stuff on boxplot, we need to first guess where each boxes are plotted at among xaxis. They appears to be at 1,2,3,4,..... Therefore, for the values in the first column, we want them to be plot at x=1; the 2nd column at x=2 and so on.

要在 上叠加东西boxplot,我们需要首先猜测每个框在 之间绘制的位置xaxis。他们似乎在1,2,3,4,....。因此,对于第一列中的值,我们希望它们在 x=1 处绘制;x=2 处的第二列,依此类推。

Any efficient way of doing it is to use np.repeat, repeat 1,2,3,4..., each for ntimes, where nis the number of observations. Then we can make a plot, using those numbers as xcoordinates. Since it is one-dimensional, for the ycoordinates, we will need a flatten view of the data, provided by df.ravel()

任何有效的方法是使用np.repeat, repeat 1,2,3,4..., eachn次,其中n是观察次数。然后我们可以绘制一个图,使用这些数字作为x坐标。由于它是一维的,对于y坐标,我们需要一个扁平化的数据视图,由df.ravel()

For overlaying the text strings, we need a anther step (a loop). As we can only plot one x value, one y value and one text string at a time.

为了覆盖文本字符串,我们需要一个花药步骤(一个循环)。因为我们一次只能绘制一个 x 值、一个 y 值和一个文本字符串。

回答by mwaskom

This hasn't been added to the seaborn.boxplotfunction yet, but there's something similar in the seaborn.violinplotfunction, which has other advantages:

这还没有被添加到seaborn.boxplot函数中,但函数中有类似的东西seaborn.violinplot,它还有其他优点

x = np.random.randn(30, 6)
sns.violinplot(x, inner="points")
sns.despine(trim=True)

enter image description here

在此处输入图片说明

回答by HP Peng

I have the following trick:

我有以下技巧:

data = np.random.randn(6,5)

df = pd.DataFrame(data,columns = list('ABCDE'))

Now assign a dummy column to df:
df['Group'] = 'A'

print df

          A         B         C         D         E Group
0  0.590600  0.226287  1.552091 -1.722084  0.459262     A
1  0.369391 -0.037151  0.136172 -0.772484  1.143328     A
2  1.147314 -0.883715 -0.444182 -1.294227  1.503786     A
3 -0.721351  0.358747  0.323395  0.165267 -1.412939     A
4 -1.757362 -0.271141  0.881554  1.229962  2.526487     A
5 -0.006882  1.503691  0.587047  0.142334  0.516781     A

Use the df.groupby.boxplot(), you get it done.

使用df.groupby.boxplot(),你就搞定了。

df.groupby('Group').boxplot()

Box plot overlay

箱线图叠加