pandas 在熊猫分组后绘制多个时间序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30942755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plotting multiple time series after a groupby in pandas
提问by Henrik Holm
Suppose I made a groupby on the valgdata DataFrame like below:
假设我在 valgdata DataFrame 上创建了一个 groupby,如下所示:
grouped_valgdata = valgdata.groupby(['news_site','dato_uden_tid']).mean()
Now I get this:
现在我明白了:
sentiment
news_site dato_uden_tid
dr.dk 2015-06-15 54.777183
2015-06-16 54.703167
2015-06-17 54.948775
2015-06-18 54.424881
2015-06-19 53.290554
eb.dk 2015-06-15 53.279251
2015-06-16 53.285643
2015-06-17 53.558753
2015-06-18 52.854750
2015-06-19 54.415988
jp.dk 2015-06-15 56.590428
2015-06-16 55.313752
2015-06-17 53.771377
2015-06-18 53.218408
2015-06-19 54.392638
pol.dk 2015-06-15 54.759532
2015-06-16 55.182641
2015-06-17 55.001800
2015-06-18 56.004326
2015-06-19 54.649052
Now I want to make a timeseries for each of the news_site, where dato_uden_tid is on the X axis and sentiment is on Y axis.
现在我想为每个 news_site 制作一个时间序列,其中 dato_uden_tid 在 X 轴上,情绪在 Y 轴上。
What is the best and easiest way to accomplish that?
实现这一目标的最佳和最简单的方法是什么?
Thank you!
谢谢!
回答by Ami Tavory
(Am a bit amused, as this question caught me doing the exact same thing.)
(有点好笑,因为这个问题让我做了完全相同的事情。)
You could do something like
你可以做类似的事情
valgdata\
.groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\
.mean()\
.unstack()
which would
这将
reverse the groupby
unstack the new sites to be columns
反转 groupby
将新站点拆开为列
To plot, just do the previous snippet immediately followed by .plot():
要绘图,只需执行前面的代码段,然后立即执行.plot():
valgdata\
.groupby([valgdata.dato_uden_tid.name, valgdata.news_site.name])\
.mean()\
.unstack()\
.plot()
回答by stackoverflowuser2010
Here is a solution using Pandas and Matplotlib with more fine-grained control.
这是一个使用 Pandas 和 Matplotlib 的解决方案,具有更细粒度的控制。
First, I provided below a function that generates a random dataframe for testing. Importantly, it creates three columns that generalize to more abstract problems:
首先,我在下面提供了一个生成用于测试的随机数据帧的函数。重要的是,它创建了三列来概括更抽象的问题:
my_timestampis adatetimecolumn containing timestampsmy_seriesis the string label to which you want to apply thegroupbymy_valueis a numeric value recorded formy_seriesat timemy_timestamp
my_timestamp是datetime包含时间戳的列my_series是您要应用的字符串标签groupbymy_value是my_series在时间记录的数值my_timestamp
Replace the column names with whatever dataframe that you have.
用您拥有的任何数据框替换列名。
def generate_random_data(N=100):
'''
Returns a dataframe with N rows of random data.
'''
list_of_lists = []
labels = ['foo', 'bar', 'baz']
epoch = 1515617110
for _ in range(N):
key = random.choice(labels)
value = 0
if key == 'foo':
value = random.randint(1, 10)
elif key == 'bar':
value = random.randint(50, 60)
else:
value = random.randint(80, 90)
epoch += random.randint(5000, 30000)
row = [key, epoch, value]
list_of_lists.append(row)
df = pd.DataFrame(list_of_lists, columns=['my_series', 'epoch', 'my_value'])
df['my_timestamp'] = pd.to_datetime(df['epoch'], unit='s')
df = df[['my_timestamp', 'my_series', 'my_value']]
#df.set_index('ts', inplace=True)
return df
Here is some example data that was generated:
以下是生成的一些示例数据:
Now, the following code will run the groupbyand plot a nice time series graph.
现在,以下代码将运行groupby并绘制一个漂亮的时间序列图。
def plot_gb_time_series(df, ts_name, gb_name, value_name, figsize=(20,7), title=None):
'''
Runs groupby on Pandas dataframe and produces a time series chart.
Parameters:
----------
df : Pandas dataframe
ts_name : string
The name of the df column that has the datetime timestamp x-axis values.
gb_name : string
The name of the df column to perform group-by.
value_name : string
The name of the df column for the y-axis.
figsize : tuple of two integers
Figure size of the resulting plot, e.g. (20, 7)
title : string
Optional title
'''
xtick_locator = DayLocator(interval=1)
xtick_dateformatter = DateFormatter('%m/%d/%Y')
fig, ax = plt.subplots(figsize=figsize)
for key, grp in df.groupby([gb_name]):
ax = grp.plot(ax=ax, kind='line', x=ts_name, y=value_name, label=key, marker='o')
ax.xaxis.set_major_locator(xtick_locator)
ax.xaxis.set_major_formatter(xtick_dateformatter)
ax.autoscale_view()
ax.legend(loc='upper left')
_ = plt.xticks(rotation=90, )
_ = plt.grid()
_ = plt.xlabel('')
_ = plt.ylim(0, df[value_name].max() * 1.25)
_ = plt.ylabel(value_name)
if title is not None:
_ = plt.title(title)
_ = plt.show()
Here is an example invocation:
这是一个示例调用:
df = generate_random_data()
plot_gb_time_series(df, 'my_timestamp', 'my_series', 'my_value',
figsize=(10, 5), title="Random data")
And here is the resulting time series plot:
这是由此产生的时间序列图:


