pandas 在熊猫分组后绘制多个时间序列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30942755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:30:17  来源:igfitidea点击:

Plotting multiple time series after a groupby in pandas

pythonpandasgroup-bytime-series

提问by Henrik Holm

Suppose I made a groupby on the valgdata DataFrame like below:

假设我在 valgdata DataFrame 上创建了一个 groupby,如下所示:

grouped_valgdata = valgdata.groupby(['news_site','dato_uden_tid']).mean()

Now I get this:

现在我明白了:

                                  sentiment
news_site          dato_uden_tid           
dr.dk              2015-06-15     54.777183
                   2015-06-16     54.703167
                   2015-06-17     54.948775
                   2015-06-18     54.424881
                   2015-06-19     53.290554
eb.dk              2015-06-15     53.279251
                   2015-06-16     53.285643
                   2015-06-17     53.558753
                   2015-06-18     52.854750
                   2015-06-19     54.415988
jp.dk              2015-06-15     56.590428
                   2015-06-16     55.313752
                   2015-06-17     53.771377
                   2015-06-18     53.218408
                   2015-06-19     54.392638
pol.dk             2015-06-15     54.759532
                   2015-06-16     55.182641
                   2015-06-17     55.001800
                   2015-06-18     56.004326
                   2015-06-19     54.649052

Now I want to make a timeseries for each of the news_site, where dato_uden_tid is on the X axis and sentiment is on Y axis.

现在我想为每个 news_site 制作一个时间序列,其中 dato_uden_tid 在 X 轴上,情绪在 Y 轴上。

What is the best and easiest way to accomplish that?

实现这一目标的最佳和最简单的方法是什么?

Thank you!

谢谢!

回答by Ami Tavory

(Am a bit amused, as this question caught me doing the exact same thing.)

(有点好笑,因为这个问题让我做了完全相同的事情。)

You could do something like

你可以做类似的事情

valgdata\
    .groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\
    .mean()\
    .unstack()

which would

这将

  • reverse the groupby

  • unstack the new sites to be columns

  • 反转 groupby

  • 将新站点拆开为列

To plot, just do the previous snippet immediately followed by .plot():

要绘图,只需执行前面的代码段,然后立即执行.plot()

valgdata\
    .groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\
    .mean()\
    .unstack()\
    .plot()

回答by stackoverflowuser2010

Here is a solution using Pandas and Matplotlib with more fine-grained control.

这是一个使用 Pandas 和 Matplotlib 的解决方案,具有更细粒度的控制。

First, I provided below a function that generates a random dataframe for testing. Importantly, it creates three columns that generalize to more abstract problems:

首先,我在下面提供了一个生成用于测试的随机数据帧的函数。重要的是,它创建了三列来概括更抽象的问题:

  • my_timestampis a datetimecolumn containing timestamps
  • my_seriesis the string label to which you want to apply the groupby
  • my_valueis a numeric value recorded for my_seriesat time my_timestamp
  • my_timestampdatetime包含时间戳的列
  • my_series是您要应用的字符串标签 groupby
  • my_valuemy_series在时间记录的数值my_timestamp

Replace the column names with whatever dataframe that you have.

用您拥有的任何数据框替换列名。

def generate_random_data(N=100):
    '''
    Returns a dataframe with N rows of random data.
    '''
    list_of_lists = []
    labels = ['foo', 'bar', 'baz']
    epoch = 1515617110
    for _ in range(N):
        key = random.choice(labels)
        value = 0
        if key == 'foo':
            value = random.randint(1, 10)
        elif key == 'bar':
            value = random.randint(50, 60)
        else:
            value = random.randint(80, 90)
        epoch += random.randint(5000, 30000)
        row = [key, epoch, value]
        list_of_lists.append(row)
    df = pd.DataFrame(list_of_lists, columns=['my_series', 'epoch', 'my_value'])
    df['my_timestamp'] = pd.to_datetime(df['epoch'], unit='s')
    df = df[['my_timestamp', 'my_series', 'my_value']]
    #df.set_index('ts', inplace=True)
    return df

Here is some example data that was generated:

以下是生成的一些示例数据:

enter image description here

在此处输入图片说明

Now, the following code will run the groupbyand plot a nice time series graph.

现在,以下代码将运行groupby并绘制一个漂亮的时间序列图。

def plot_gb_time_series(df, ts_name, gb_name, value_name, figsize=(20,7), title=None):
    '''
    Runs groupby on Pandas dataframe and produces a time series chart.

    Parameters:
    ----------
    df : Pandas dataframe
    ts_name : string
        The name of the df column that has the datetime timestamp x-axis values.
    gb_name : string
        The name of the df column to perform group-by.
    value_name : string
        The name of the df column for the y-axis.
    figsize : tuple of two integers
        Figure size of the resulting plot, e.g. (20, 7)
    title : string
        Optional title
    '''
    xtick_locator = DayLocator(interval=1)
    xtick_dateformatter = DateFormatter('%m/%d/%Y')
    fig, ax = plt.subplots(figsize=figsize)
    for key, grp in df.groupby([gb_name]):
        ax = grp.plot(ax=ax, kind='line', x=ts_name, y=value_name, label=key, marker='o')
    ax.xaxis.set_major_locator(xtick_locator)
    ax.xaxis.set_major_formatter(xtick_dateformatter)
    ax.autoscale_view()
    ax.legend(loc='upper left')
    _ = plt.xticks(rotation=90, )
    _ = plt.grid()
    _ = plt.xlabel('')
    _ = plt.ylim(0, df[value_name].max() * 1.25)
    _ = plt.ylabel(value_name)
    if title is not None:
        _ = plt.title(title)
    _ = plt.show()

Here is an example invocation:

这是一个示例调用:

df = generate_random_data()

plot_gb_time_series(df, 'my_timestamp', 'my_series', 'my_value',
                    figsize=(10, 5), title="Random data")

And here is the resulting time series plot:

这是由此产生的时间序列图:

enter image description here

在此处输入图片说明