从 Pandas 数据框覆盖箱线图上的实际数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23036317/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Overlaying actual data on a boxplot from a pandas dataframe
提问by geog_newbie
I am using Seaborn to make boxplots from pandas dataframes. Seabornboxplots seem to essentially read the dataframes the same way as the pandasboxplotfunctionality (so I hope the solution is the same for both -- but I can just use the dataframe.boxplotfunction as well). My dataframe has 12 columns and the following code generates a single plot with one boxplot for each column (just like the dataframe.boxplot()function would).
我正在使用 Seaborn 从 Pandas 数据帧制作箱线图。Seabornboxplots 似乎基本上以与pandasboxplot功能相同的方式读取数据帧(所以我希望解决方案对两者都是相同的——但我也可以使用该dataframe.boxplot函数)。我的数据框有 12 列,以下代码生成一个图,每列一个箱线图(就像dataframe.boxplot()函数一样)。
fig, ax = plt.subplots()
sns.set_style("darkgrid", {"axes.facecolor":"darkgrey"})
pal = sns.color_palette("husl",12)
sns.boxplot(dataframe, color = pal)
Can anyone suggest a simple way of overlaying all the values (by columns) while making a boxplot from dataframes? I will appreciate any help with this.
任何人都可以提出一种在从数据帧制作箱线图时覆盖所有值(按列)的简单方法吗?我将不胜感激。
采纳答案by CT Zhu
A general solution for the boxplot for the entire dataframe, which should work for both seabornand pandasas their are all matplotlibbased under the hood, I will use pandasplot as the example, assuming import matplotlib.pyplot as pltalready in place. As you have already have the ax, it would make better sense to just use ax.text(...)instead of plt.text(...).
整个数据框的箱线图的通用解决方案,它应该适用于两者seaborn,pandas并且它们都matplotlib基于引擎盖,我将使用pandasplot 作为示例,假设import matplotlib.pyplot as plt已经到位。由于您已经拥有ax,因此只使用ax.text(...)代替会更有意义plt.text(...)。
In [35]:
print df
V1 V2 V3 V4 V5
0 0.895739 0.850580 0.307908 0.917853 0.047017
1 0.931968 0.284934 0.335696 0.153758 0.898149
2 0.405657 0.472525 0.958116 0.859716 0.067340
3 0.843003 0.224331 0.301219 0.000170 0.229840
4 0.634489 0.905062 0.857495 0.246697 0.983037
5 0.573692 0.951600 0.023633 0.292816 0.243963
[6 rows x 5 columns]
In [34]:
df.boxplot()
for x, y, s in zip(np.repeat(np.arange(df.shape[1])+1, df.shape[0]),
df.values.ravel(), df.values.astype('|S5').ravel()):
plt.text(x,y,s,ha='center',va='center')


For a single series in the dataframe, a few small changes is necessary:
对于数据框中的单个系列,需要进行一些小的更改:
In [35]:
sub_df=df.V1
pd.DataFrame(sub_df).boxplot()
for x, y, s in zip(np.repeat(1, df.shape[0]),
sub_df.ravel(), sub_df.values.astype('|S5').ravel()):
plt.text(x,y,s,ha='center',va='center')


Making scatter plots is also similar:
制作散点图也类似:
#for the whole thing
df.boxplot()
plt.scatter(np.repeat(np.arange(df.shape[1])+1, df.shape[0]), df.values.ravel(), marker='+', alpha=0.5)
#for just one column
sub_df=df.V1
pd.DataFrame(sub_df).boxplot()
plt.scatter(np.repeat(1, df.shape[0]), sub_df.ravel(), marker='+', alpha=0.5)




To overlay stuff on boxplot, we need to first guess where each boxes are plotted at among xaxis. They appears to be at 1,2,3,4,..... Therefore, for the values in the first column, we want them to be plot at x=1; the 2nd column at x=2 and so on.
要在 上叠加东西boxplot,我们需要首先猜测每个框在 之间绘制的位置xaxis。他们似乎在1,2,3,4,....。因此,对于第一列中的值,我们希望它们在 x=1 处绘制;x=2 处的第二列,依此类推。
Any efficient way of doing it is to use np.repeat, repeat 1,2,3,4..., each for ntimes, where nis the number of observations. Then we can make a plot, using those numbers as xcoordinates. Since it is one-dimensional, for the ycoordinates, we will need a flatten view of the data, provided by df.ravel()
任何有效的方法是使用np.repeat, repeat 1,2,3,4..., eachn次,其中n是观察次数。然后我们可以绘制一个图,使用这些数字作为x坐标。由于它是一维的,对于y坐标,我们需要一个扁平化的数据视图,由df.ravel()
For overlaying the text strings, we need a anther step (a loop). As we can only plot one x value, one y value and one text string at a time.
为了覆盖文本字符串,我们需要一个花药步骤(一个循环)。因为我们一次只能绘制一个 x 值、一个 y 值和一个文本字符串。
回答by mwaskom
This hasn't been added to the seaborn.boxplotfunction yet, but there's something similar in the seaborn.violinplotfunction, which has other advantages:
这还没有被添加到seaborn.boxplot函数中,但函数中有类似的东西seaborn.violinplot,它还有其他优点:
x = np.random.randn(30, 6)
sns.violinplot(x, inner="points")
sns.despine(trim=True)


回答by HP Peng
I have the following trick:
我有以下技巧:
data = np.random.randn(6,5)
df = pd.DataFrame(data,columns = list('ABCDE'))
Now assign a dummy column to df:
df['Group'] = 'A'
print df
A B C D E Group
0 0.590600 0.226287 1.552091 -1.722084 0.459262 A
1 0.369391 -0.037151 0.136172 -0.772484 1.143328 A
2 1.147314 -0.883715 -0.444182 -1.294227 1.503786 A
3 -0.721351 0.358747 0.323395 0.165267 -1.412939 A
4 -1.757362 -0.271141 0.881554 1.229962 2.526487 A
5 -0.006882 1.503691 0.587047 0.142334 0.516781 A
Use the df.groupby.boxplot(), you get it done.
使用df.groupby.boxplot(),你就搞定了。
df.groupby('Group').boxplot()



