Python 使用 matplotlib 将点的散点添加到箱线图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29779079/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:00:16  来源:igfitidea点击:

Adding a scatter of points to a boxplot using matplotlib

pythonmatplotlibboxplot

提问by Wok

I have seen this wonderful boxplot in this article(Fig.2).

我在这篇文章中看到了这个美妙的箱线图(图 2)。

A wonderful boxplot

精彩的箱线图

As you can see, this is a boxplot on which are superimposed a scatter of black points: x indexes the black points (in a random order), y is the variable of interest. I would like to do something similar using Matplotlib, but I have no idea where to start. So far, the boxplots which I have found online are way less cool and look like this:

如您所见,这是一个箱线图,其上叠加了散点的黑点:x 索引黑点(以随机顺序),y 是感兴趣的变量。我想使用 Matplotlib 做类似的事情,但我不知道从哪里开始。到目前为止,我在网上找到的箱线图不太酷,看起来像这样:

Usual boxplots

Usual boxplots

Documentation of matplotlib: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot

matplotlib 的文档:http: //matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot

Ways to colorize boxplots: https://github.com/jbmouret/matplotlib_for_papers#colored-boxes

为箱线图着色的方法:https: //github.com/jbmouret/matplotlib_for_papers#colored-boxes

采纳答案by Kyrubas

What you're looking for is a way to add jitter to the x-axis.

您正在寻找的是一种向 x 轴添加抖动的方法。

Something like this taken from here:

像这样的东西取自这里

bp = titanic.boxplot(column='age', by='pclass', grid=False)
for i in [1,2,3]:
    y = titanic.age[titanic.pclass==i].dropna()
    # Add some random "jitter" to the x-axis
    x = np.random.normal(i, 0.04, size=len(y))
    plot(x, y, 'r.', alpha=0.2)

enter image description here

enter image description here

Quoting the link:

引用链接:

One way to add additional information to a boxplot is to overlay the actual data; this is generally most suitable with small- or moderate-sized data series. When data are dense, a couple of tricks used above help the visualization:

  1. reducing the alpha level to make the points partially transparent
  2. adding random "jitter" along the x-axis to avoid overstriking

向箱线图中添加附加信息的一种方法是叠加实际数据;这通常最适合小型或中等规模的数据系列。当数据密集时,上面使用的一些技巧有助于可视化:

  1. 降低 alpha 级别以使点部分透明
  2. 沿 x 轴添加随机“抖动”以避免过度打击

The code looks like this:

代码如下所示:

import pylab as P
import numpy as np

# Define data
# Define numBoxes

P.figure()

bp = P.boxplot(data)

for i in range(numBoxes):
    y = data[i]
    x = np.random.normal(1+i, 0.04, size=len(y))
    P.plot(x, y, 'r.', alpha=0.2)

P.show()

回答by hwang

Expanding on Kyrubas's solution and using only matplotlib for the plotting part (sometimes I have difficulty formatting pandas plots with matplotlib).

扩展 Kyrubas 的解决方案并仅将 matplotlib 用于绘图部分(有时我很难用 matplotlib 格式化熊猫图)。

from matplotlib import cm
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# initialize dataframe
n = 200
ngroup = 3
df = pd.DataFrame({'data': np.random.rand(n), 'group': map(np.floor, np.random.rand(n) * ngroup)})

group = 'group'
column = 'data'
grouped = df.groupby(group)

names, vals, xs = [], [] ,[]

for i, (name, subdf) in enumerate(grouped):
    names.append(name)
    vals.append(subdf[column].tolist())
    xs.append(np.random.normal(i+1, 0.04, subdf.shape[0]))

plt.boxplot(vals, labels=names)
ngroup = len(vals)
clevels = np.linspace(0., 1., ngroup)

for x, val, clevel in zip(xs, vals, clevels):
    plt.scatter(x, val, c=cm.prism(clevel), alpha=0.4)

enter image description here

enter image description here

回答by HS-nebula

As a simpler, possibly newer option, you could use seaborn's swarmplotoption.

作为一个更简单、可能更新的选项,您可以使用seaborn'sswarmplot选项。

import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="whitegrid")
tips = sns.load_dataset("tips")

ax = sns.boxplot(x="day", y="total_bill", data=tips, showfliers = False)
ax = sns.swarmplot(x="day", y="total_bill", data=tips, color=".25")

plt.show()

enter image description here

enter image description here



Looking at the original question again (and having more experience myself), I think instead of sns.swarmplot, sns.stripplotwould be more accurate.

再次查看原始问题(并且自己有更多经验),我认为而不是sns.swarmplot,sns.stripplot会更准确。