pandas 多列的熊猫箱线图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38120688/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas box plot for multiple column
提问by user1877600
My data frames (pandas's structure) looks like above
Now I want to make boxplot for each feature on separate canvas. The separation condition is the first column. I have similar plot for histogram (code below) but I can't make working version for the boxplot.
现在我想在单独的画布上为每个功能制作箱线图。分离条件是第一列。我有类似的直方图图(下面的代码),但我无法为箱线图制作工作版本。
hist_params = {'normed': True, 'bins': 60, 'alpha': 0.4}
# create the figure
fig = plt.figure(figsize=(16, 25))
for n, feature in enumerate(features):
# add sub plot on our figure
ax = fig.add_subplot(features.shape[1] // 5 + 1, 6, n + 1)
# define range for histograms by cutting 1% of data from both ends
min_value, max_value = numpy.percentile(data[feature], [1, 99])
ax.hist(data.ix[data.is_true_seed.values == 0, feature].values, range=(min_value, max_value),
label='ghost', **hist_params)
ax.hist(data.ix[data.is_true_seed.values == 1, feature].values, range=(min_value, max_value),
label='true', **hist_params)
ax.legend(loc='best')
ax.set_title(feature)
Above code produce such output as (attached only part of it):
回答by Alberto Garcia-Raboso
DataFrame.boxplot()
automates this rather well:
DataFrame.boxplot()
自动化这个相当好:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'is_true_seed': np.random.choice([True, False], 10),
'col1': np.random.normal(size=10),
'col2': np.random.normal(size=10),
'col3': np.random.normal(size=10)})
fig, ax = plt.subplots(figsize=(10, 10))
df.boxplot(['col1', 'col2', 'col3'], 'is_true_seed', ax)
The first argument tells pandas which columns to plot, the second which column to group by (what you call the separation condition), and the third on which axes to draw.
第一个参数告诉 pandas 要绘制哪些列,第二个参数告诉 Pandas 要根据哪一列进行分组(您称之为分离条件),以及第三个要在哪些轴上绘制。
Listing all columns but the one you want to group by can get tedious, but you can avoid it by omitting that first argument. You then have to explicitly name the other two:
列出所有列但要分组的列可能会变得乏味,但您可以通过省略第一个参数来避免它。然后,您必须明确命名其他两个:
df.boxplot(by='is_true_seed', ax=ax)