Pandas DataFrame 多列的并排箱线图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42760965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Side-by-side boxplot of multiple columns of a pandas DataFrame
提问by Fred S
One year of sample data:
一年的样本数据:
import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A":rnd.randn(n), "B":rnd.randn(n)+1},
index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
I want to boxplot these data side-by-side grouped by the month (i.e., two boxes per month, one for A
and one for B
).
我想将这些数据按月份并排分组(即,每月两个框,一个用于A
,一个用于B
)。
For a single column sns.boxplot(df.index.month, df["A"])
works fine. However, sns.boxplot(df.index.month, df[["A", "B"]])
throws an error (ValueError: cannot copy sequence with size 2 to array axis with dimension 365
). Melting the data by the index (pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column")
) in order to use seaborn's hue
property as a workaround doesn't work either (TypeError: unhashable type: 'DatetimeIndex'
).
对于单列sns.boxplot(df.index.month, df["A"])
工作正常。但是,sns.boxplot(df.index.month, df[["A", "B"]])
会引发错误 ( ValueError: cannot copy sequence with size 2 to array axis with dimension 365
)。通过索引 ( pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column")
)融化数据以使用 seaborn 的hue
属性作为解决方法也不起作用 ( TypeError: unhashable type: 'DatetimeIndex'
)。
(A solution doesn't necessarily need to use seaborn, if it is easier using plain matplotlib.)
(如果使用普通 matplotlib 更容易,解决方案不一定需要使用 seaborn。)
Edit
编辑
I found a workaround that basically produces what I want. However, it becomes somewhat awkward to work with once the DataFrame includes more variables than I want to plot. So if there is a more elegant/direct way to do it, please share!
我找到了一个基本上可以产生我想要的解决方法。但是,一旦 DataFrame 包含的变量多于我想要绘制的变量,使用起来就会有些尴尬。所以如果有更优雅/直接的方法来做到这一点,请分享!
df_stacked = df.stack().reset_index()
df_stacked.columns = ["date", "vars", "vals"]
df_stacked.index = df_stacked["date"]
sns.boxplot(x=df_stacked.index.month, y="vals", hue="vars", data=df_stacked)
回答by Timothy Sweetser
here's a solution using pandas melting and seaborn:
这是使用Pandas融化和seaborn的解决方案:
import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A": rnd.randn(n),
"B": rnd.randn(n)+1,
"C": rnd.randn(n) + 10, # will not be plotted
},
index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
df['month'] = df.index.month
df_plot = df.melt(id_vars='month', value_vars=["A", "B"])
sns.boxplot(x='month', y='value', hue='variable', data=df_plot)
回答by benSooraj
I do not understand your question completely, but you might to take a look at this approach using matplotlib
. Not the best solution though.
我不完全理解您的问题,但您可以使用matplotlib
. 虽然不是最好的解决方案。
1) Break df
into 12 DataFrames by month
s, all stacked in a list
1)s分解df
成12个DataFrames month
,全部堆叠在一个列表中
DFList = []
for group in df_3.groupby(df_3.index.month):
DFList.append(group[1])
2) Plot them one after the other in a loop:
2)在一个循环中一个接一个地绘制它们:
for _ in range(12):
DFList[_].plot(kind='box', subplots=True, layout=(2,2), sharex=True, sharey=True, figsize=(7,7))
plt.show()
3) Here's a snapshot of the 1st three rows:
3)这是第一三行的快照:
You might also want to checkout
matplotlib
'sadd_subplot
method
您可能还想 checkout
matplotlib
的add_subplot
方法
回答by Grégory Belhumeur
month_dfs = []
for group in df.groupby(df.index.month):
month_dfs.append(group[1])
plt.figure(figsize=(30,5))
for i,month_df in enumerate(month_dfs):
axi = plt.subplot(1, len(month_dfs), i + 1)
month_df.plot(kind='box', subplots=False, ax = axi)
plt.title(i+1)
plt.ylim([-4, 4])
plt.show()
Will give this
会给这个
Not exactly what you're looking for but you get to keep a readable DataFrame if you add more variables.
不完全是您要查找的内容,但是如果添加更多变量,您可以保持可读的 DataFrame。
You can also easily remove the axis by using
您还可以使用以下命令轻松删除轴
if i > 0:
y_axis = axi.axes.get_yaxis()
y_axis.set_visible(False)
in the loop before plt.show()
在循环之前 plt.show()
回答by foglerit
This is quite straight-forward using Altair:
使用Altair非常简单:
alt.Chart(
df.reset_index().melt(id_vars = ["index"], value_vars=["A", "B"]).assign(month = lambda x: x["index"].dt.month)
).mark_boxplot(
extent='min-max'
).encode(
alt.X('variable:N', title=''),
alt.Y('value:Q'),
column='month:N',
color='variable:N'
)
The code above melts the DataFrame and adds a
month
column. Then Altair creates box-plots for each variable broken down by months as the plot columns.
上面的代码融化了 DataFrame 并添加了一
month
列。然后 Altair 为按月细分的每个变量创建箱线图作为绘图列。