pandas 使用 Seaborn 绘制最小/最大阴影的时间序列图
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37767719/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Timeseries plot with min/max shading using Seaborn
提问by Silviu Tofan
I am trying to create a 3-line time series plot based on the following data , in a Week x Overload graph, where each Cluster is a different line.
我正在尝试根据以下数据在 Week x Overload 图中创建一个 3 线时间序列图,其中每个集群是不同的线。
I have multiple observations for each (Cluster, Week) pair (5 for each atm, will have 1000). I would like the points on the line to be the average Overload value for that specific (Cluster, Week) pair, and the band be the min/max values of it.
我对每个(集群,周)对(每个 atm 5 个,将有 1000 个)进行多次观察。我希望线上的点是该特定(集群、周)对的平均过载值,而波段是它的最小值/最大值。
Currently using the following bit of code to plot it, but I'm not getting any lines, as I don't know what unit to specify using the current dataframe:
目前使用以下代码来绘制它,但我没有得到任何线条,因为我不知道使用当前数据帧指定什么单位:
ax14 = sns.tsplot(data = long_total_cluster_capacity_overload_df, value = "Overload", time = "Week", condition = "Cluster")
I have a feeling I still need to re-shape my dataframe, but I have no idea how. Looking for a final results that looks like this
采纳答案by michael_j_ward
Based off this incredible answer, I was able to create a monkey patch to beautifully do what you are looking for.
基于这个令人难以置信的答案,我能够创建一个猴子补丁来精美地做你正在寻找的东西。
import pandas as pd
import seaborn as sns
import seaborn.timeseries
def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs):
upper = data.max(axis=0)
lower = data.min(axis=0)
#import pdb; pdb.set_trace()
ci = np.asarray((lower, upper))
kwargs.update({"central_data": central_data, "ci": ci, "data": data})
seaborn.timeseries._plot_ci_band(*args, **kwargs)
seaborn.timeseries._plot_range_band = _plot_range_band
cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount()
ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", unit="Unit", data=cluster_overload,
err_style="range_band", n_boot=0)
Notice that the shaded regions line up with the true maximum and minimums in the line graph!
请注意,阴影区域与折线图中的真实最大值和最小值对齐!
If you figure out why the unit
variable is required, please let me know.
如果您弄清楚为什么unit
需要该变量,请告诉我。
If you do not want them all on the same graph then:
如果您不希望它们都在同一个图表上,那么:
import pandas as pd
import seaborn as sns
import seaborn.timeseries
def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs):
upper = data.max(axis=0)
lower = data.min(axis=0)
#import pdb; pdb.set_trace()
ci = np.asarray((lower, upper))
kwargs.update({"central_data": central_data, "ci": ci, "data": data})
seaborn.timeseries._plot_ci_band(*args, **kwargs)
seaborn.timeseries._plot_range_band = _plot_range_band
cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['subindex'] = cluster_overload.groupby(['Cluster','Week']).cumcount()
def customPlot(*args,**kwargs):
df = kwargs.pop('data')
pivoted = df.pivot(index='subindex', columns='Week', values='Overload')
ax = sns.tsplot(pivoted.values, err_style="range_band", n_boot=0, color=kwargs['color'])
g = sns.FacetGrid(cluster_overload, row="Cluster", sharey=False, hue='Cluster', aspect=3)
g = g.map_dataframe(customPlot, 'Week', 'Overload','subindex')
Which produces the following, (you can obviously play with the aspect ratio if you think the proportions are off)
回答by Romain
I finally used the good old plot
with a design (subplots) that seems (to me) more readable.
我终于使用了旧plot
的设计(子图),看起来(对我来说)更具可读性。
df = pd.read_csv('TSplot.csv', sep='\t', index_col=0)
# Compute the min, mean and max (could also be other values)
grouped = df.groupby(["Cluster", "Week"]).agg({'Overload': ['min', 'mean', 'max']}).unstack("Cluster")
# Plot with sublot since it is more readable
axes = grouped.loc[:,('Overload', 'mean')].plot(subplots=True)
# Getting the color palette used
palette = sns.color_palette()
# Initializing an index to get each cluster and each color
index = 0
for ax in axes:
ax.fill_between(grouped.index, grouped.loc[:,('Overload', 'mean', index + 1)],
grouped.loc[:,('Overload', 'max', index + 1 )], alpha=.2, color=palette[index])
ax.fill_between(grouped.index,
grouped.loc[:,('Overload', 'min', index + 1)] , grouped.loc[:,('Overload', 'mean', index + 1)], alpha=.2, color=palette[index])
index +=1
回答by michael_j_ward
I reallythought I would be able to do it with seaborn.tsplot
. But it does not quite look right. Here is the result I get with seaborn:
我真的以为我可以用seaborn.tsplot
. 但它看起来并不完全正确。这是我用 seaborn 得到的结果:
cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount()
ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", ci=100, unit="Unit", data=cluster_overload)
Outputs:
输出:
I am really confused as to why the unit
parameter is necessary since my understanding is that all the data is aggregated based on (time, condition)
The Seaborn Documentationdefines unit
as
我真的很困惑,为什么unit
参数是必要的,因为我的理解是,所有的数据是基于汇总(time, condition)
的Seaborn文档定义unit
为
Field in the data DataFrame identifying the sampling unit (e.g. subject, neuron, etc.). The error representation will collapse over units at each time/condition observation. This has no role when data is an array.
数据DataFrame 中的字段标识采样单元(例如主体、神经元等)。错误表示将在每次/条件观察时在单位上折叠。当数据是数组时,这没有作用。
I am not certain of the meaning of 'collapsed over'- especially since my definition wouldn't make it a required variable.
我不确定“折叠”的含义 - 特别是因为我的定义不会使它成为必需的变量。
Anyways, here's the output if you want exactlywhat you discussed, not nearly as pretty. I am not sure how to manually shade in those regions, but please share if you figure it out.
不管怎么说,这里的输出,如果你想正是你讨论,几乎没有一样漂亮。我不确定如何在这些区域手动着色,但如果你想通了,请分享。
cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
grouped = cluster_overload.groupby(['Cluster','Week'],as_index=False)
stats = grouped.agg(['min','mean','max']).unstack().T
stats.index = stats.index.droplevel(0)
colors = ['b','g','r']
ax = stats.loc['mean'].plot(color=colors, alpha=0.8, linewidth=3)
stats.loc['max'].plot(ax=ax,color=colors,legend=False, alpha=0.3)
stats.loc['min'].plot(ax=ax,color=colors,legend=False, alpha=0.3)