pandas 使用 Seaborn 绘制最小/最大阴影的时间序列图

Question

提问by Silviu Tofan

I am trying to create a 3-line time series plot based on the following data , in a Week x Overload graph, where each Cluster is a different line.

我正在尝试根据以下数据在 Week x Overload 图中创建一个 3 线时间序列图，其中每个集群是不同的线。

I have multiple observations for each (Cluster, Week) pair (5 for each atm, will have 1000). I would like the points on the line to be the average Overload value for that specific (Cluster, Week) pair, and the band be the min/max values of it.

我对每个（集群，周）对（每个 atm 5 个，将有 1000 个）进行多次观察。我希望线上的点是该特定（集群、周）对的平均过载值，而波段是它的最小值/最大值。

Currently using the following bit of code to plot it, but I'm not getting any lines, as I don't know what unit to specify using the current dataframe:

目前使用以下代码来绘制它，但我没有得到任何线条，因为我不知道使用当前数据帧指定什么单位：

    ax14 = sns.tsplot(data = long_total_cluster_capacity_overload_df, value = "Overload", time = "Week", condition = "Cluster")

GIST Data

地理信息系统数据

I have a feeling I still need to re-shape my dataframe, but I have no idea how. Looking for a final results that looks like this

我有一种感觉，我仍然需要重新塑造我的数据框，但我不知道如何。寻找看起来像这样的最终结果

Answer 1

采纳答案by michael_j_ward

Based off this incredible answer, I was able to create a monkey patch to beautifully do what you are looking for.

基于这个令人难以置信的答案，我能够创建一个猴子补丁来精美地做你正在寻找的东西。

import pandas as pd
import seaborn as sns    
import seaborn.timeseries

def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs):
    upper = data.max(axis=0)
    lower = data.min(axis=0)
    #import pdb; pdb.set_trace()
    ci = np.asarray((lower, upper))
    kwargs.update({"central_data": central_data, "ci": ci, "data": data})
    seaborn.timeseries._plot_ci_band(*args, **kwargs)

seaborn.timeseries._plot_range_band = _plot_range_band

cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount()

ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", unit="Unit", data=cluster_overload,
               err_style="range_band", n_boot=0)

Output Graph:

输出图：

Notice that the shaded regions line up with the true maximum and minimums in the line graph!

请注意，阴影区域与折线图中的真实最大值和最小值对齐！

If you figure out why the unitvariable is required, please let me know.

如果您弄清楚为什么unit需要该变量，请告诉我。

If you do not want them all on the same graph then:

如果您不希望它们都在同一个图表上，那么：

import pandas as pd
import seaborn as sns
import seaborn.timeseries


def _plot_range_band(*args, central_data=None, ci=None, data=None, **kwargs):
    upper = data.max(axis=0)
    lower = data.min(axis=0)
    #import pdb; pdb.set_trace()
    ci = np.asarray((lower, upper))
    kwargs.update({"central_data": central_data, "ci": ci, "data": data})
    seaborn.timeseries._plot_ci_band(*args, **kwargs)

seaborn.timeseries._plot_range_band = _plot_range_band

cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['subindex'] = cluster_overload.groupby(['Cluster','Week']).cumcount()

def customPlot(*args,**kwargs):
    df = kwargs.pop('data')
    pivoted = df.pivot(index='subindex', columns='Week', values='Overload')
    ax = sns.tsplot(pivoted.values, err_style="range_band", n_boot=0, color=kwargs['color'])

g = sns.FacetGrid(cluster_overload, row="Cluster", sharey=False, hue='Cluster', aspect=3)
g = g.map_dataframe(customPlot, 'Week', 'Overload','subindex')

Which produces the following, (you can obviously play with the aspect ratio if you think the proportions are off)

产生以下结果，（如果您认为比例不合适，您显然可以玩纵横比）

Answer 2

回答by Romain

I finally used the good old plotwith a design (subplots) that seems (to me) more readable.

我终于使用了旧plot的设计（子图），看起来（对我来说）更具可读性。

df = pd.read_csv('TSplot.csv', sep='\t', index_col=0)
# Compute the min, mean and max (could also be other values)
grouped = df.groupby(["Cluster", "Week"]).agg({'Overload': ['min', 'mean', 'max']}).unstack("Cluster")

# Plot with sublot since it is more readable
axes = grouped.loc[:,('Overload', 'mean')].plot(subplots=True)

# Getting the color palette used
palette = sns.color_palette()

# Initializing an index to get each cluster and each color
index = 0
for ax in axes:
    ax.fill_between(grouped.index, grouped.loc[:,('Overload', 'mean', index + 1)], 
                    grouped.loc[:,('Overload', 'max', index + 1 )], alpha=.2, color=palette[index])
    ax.fill_between(grouped.index, 
                    grouped.loc[:,('Overload', 'min', index + 1)] , grouped.loc[:,('Overload', 'mean', index + 1)], alpha=.2, color=palette[index])
    index +=1

Answer 3

回答by michael_j_ward

I reallythought I would be able to do it with seaborn.tsplot. But it does not quite look right. Here is the result I get with seaborn:

我真的以为我可以用seaborn.tsplot. 但它看起来并不完全正确。这是我用 seaborn 得到的结果：

cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
cluster_overload['Unit'] = cluster_overload.groupby(['Cluster','Week']).cumcount()
ax = sns.tsplot(time='Week',value="Overload", condition="Cluster", ci=100, unit="Unit", data=cluster_overload)

Outputs:

输出：

I am really confused as to why the unitparameter is necessary since my understanding is that all the data is aggregated based on (time, condition)The Seaborn Documentationdefines unitas

我真的很困惑，为什么unit参数是必要的，因为我的理解是，所有的数据是基于汇总(time, condition)的Seaborn文档定义unit为

Field in the data DataFrame identifying the sampling unit (e.g. subject, neuron, etc.). The error representation will collapse over units at each time/condition observation. This has no role when data is an array.

数据DataFrame 中的字段标识采样单元（例如主体、神经元等）。错误表示将在每次/条件观察时在单位上折叠。当数据是数组时，这没有作用。

I am not certain of the meaning of 'collapsed over'- especially since my definition wouldn't make it a required variable.

我不确定“折叠”的含义 - 特别是因为我的定义不会使它成为必需的变量。

Anyways, here's the output if you want exactlywhat you discussed, not nearly as pretty. I am not sure how to manually shade in those regions, but please share if you figure it out.

不管怎么说，这里的输出，如果你想正是你讨论，几乎没有一样漂亮。我不确定如何在这些区域手动着色，但如果你想通了，请分享。

cluster_overload = pd.read_csv("TSplot.csv", delim_whitespace=True)
grouped = cluster_overload.groupby(['Cluster','Week'],as_index=False)
stats = grouped.agg(['min','mean','max']).unstack().T
stats.index = stats.index.droplevel(0)

colors = ['b','g','r']
ax = stats.loc['mean'].plot(color=colors, alpha=0.8, linewidth=3)
stats.loc['max'].plot(ax=ax,color=colors,legend=False, alpha=0.3)
stats.loc['min'].plot(ax=ax,color=colors,legend=False, alpha=0.3)

Outputs:

输出：

pandas 使用 Seaborn 绘制最小/最大阴影的时间序列图

提问by Silviu Tofan

采纳答案by michael_j_ward

回答by Romain

回答by michael_j_ward

相关推荐

最近更新

标签

pandas 使用 Seaborn 绘制最小/最大阴影的时间序列图

提问by Silviu Tofan

采纳答案by michael_j_ward

回答by Romain

回答by michael_j_ward

相关推荐

pandas 如何用 1 替换数据帧的所有非 NaN 条目，用 0 替换所有 NaN

pandas 从 Python 中的信号中删除尖峰

pandas 熊猫分组并制作一组项目

pandas groupby 删除列

相关推荐

最近更新

标签