pandas 多列的熊猫箱线图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38120688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:29:24  来源:igfitidea点击:

pandas box plot for multiple column

pythonpandasboxplot

提问by user1877600

My data frames (pandas's structure) looks like above enter image description here

我的数据框(Pandas的结构)看起来像上面 在此处输入图片说明

Now I want to make boxplot for each feature on separate canvas. The separation condition is the first column. I have similar plot for histogram (code below) but I can't make working version for the boxplot.

现在我想在单独的画布上为每个功能制作箱线图。分离条件是第一列。我有类似的直方图图(下面的代码),但我无法为箱线图制作工作版本。

 hist_params = {'normed': True, 'bins': 60, 'alpha': 0.4}
# create the figure
fig = plt.figure(figsize=(16,  25))
for n, feature in enumerate(features):
    # add sub plot on our figure
    ax = fig.add_subplot(features.shape[1] // 5 + 1, 6, n + 1)
    # define range for histograms by cutting 1% of data from both ends
    min_value, max_value = numpy.percentile(data[feature], [1, 99])
    ax.hist(data.ix[data.is_true_seed.values == 0, feature].values, range=(min_value, max_value), 
             label='ghost', **hist_params)
    ax.hist(data.ix[data.is_true_seed.values == 1, feature].values, range=(min_value, max_value), 
             label='true', **hist_params)
    ax.legend(loc='best')

    ax.set_title(feature)

Above code produce such output as (attached only part of it): enter image description here

上面的代码产生这样的输出(只附上它的一部分): 在此处输入图片说明

回答by Alberto Garcia-Raboso

DataFrame.boxplot()automates this rather well:

DataFrame.boxplot()自动化这个相当好:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'is_true_seed': np.random.choice([True, False], 10),
                   'col1': np.random.normal(size=10),
                   'col2': np.random.normal(size=10),
                   'col3': np.random.normal(size=10)})

fig, ax = plt.subplots(figsize=(10,  10))
df.boxplot(['col1', 'col2', 'col3'], 'is_true_seed', ax)

The first argument tells pandas which columns to plot, the second which column to group by (what you call the separation condition), and the third on which axes to draw.

第一个参数告诉 pandas 要绘制哪些列,第二个参数告诉 Pandas 要根据哪一列进行分组(您称之为分离条件),以及第三个要在哪些轴上绘制。

Listing all columns but the one you want to group by can get tedious, but you can avoid it by omitting that first argument. You then have to explicitly name the other two:

列出所有列但要分组的列可能会变得乏味,但您可以通过省略第一个参数来避免它。然后,您必须明确命名其他两个:

df.boxplot(by='is_true_seed', ax=ax)