Python 使用 seaborn 绘制时间序列数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22795348/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Plotting time-series data with seaborn
提问by Amelio Vazquez-Reina
Say I create a fully random Dataframe
using the following:
假设我Dataframe
使用以下内容创建了一个完全随机的:
from pandas.util import testing
from random import randrange
def random_date(start, end):
delta = end - start
int_delta = (delta.days * 24 * 60 * 60) + delta.seconds
random_second = randrange(int_delta)
return start + timedelta(seconds=random_second)
def rand_dataframe():
df = testing.makeDataFrame()
df['date'] = [random_date(datetime.date(2014,3,18),datetime.date(2014,4,1)) for x in xrange(df.shape[0])]
df.sort(columns=['date'], inplace=True)
return df
df = rand_dataframe()
which results in the dataframe shown at the bottom of this post. I would like to plot my columns A
, B
, C
and D
using the timeseriesvisualization features in seaborn
so that I get something along these lines:
这导致本文底部显示的数据框。我想我的阴谋列A
,B
,C
和D
使用时间序列的可视化功能seaborn
,使我得到这些方针的东西:
How can I approach this problem? From what I read on this notebook, the call should be:
我该如何解决这个问题?从我在这个笔记本上读到的,电话应该是:
sns.tsplot(df, time="time", unit="unit", condition="condition", value="value")
but this seems to require that the dataframe is represented in a different way, with the columns somehow encoding time
, unit
, condition
and value
, which is not my case. How can I convert my dataframe (shown below) into this format?
但这似乎需要数据框被以不同的方式来表示,用某种方式编码列time
,unit
,condition
并且value
,这不是我的情况。如何将我的数据帧(如下所示)转换为这种格式?
Here is my dataframe:
这是我的数据框:
date A B C D
2014-03-18 1.223777 0.356887 1.201624 1.968612
2014-03-18 0.160730 1.888415 0.306334 0.203939
2014-03-18 -0.203101 -0.161298 2.426540 0.056791
2014-03-18 -1.350102 0.990093 0.495406 0.036215
2014-03-18 -1.862960 2.673009 -0.545336 -0.925385
2014-03-19 0.238281 0.468102 -0.150869 0.955069
2014-03-20 1.575317 0.811892 0.198165 1.117805
2014-03-20 0.822698 -0.398840 -1.277511 0.811691
2014-03-20 2.143201 -0.827853 -0.989221 1.088297
2014-03-20 0.299331 1.144311 -0.387854 0.209612
2014-03-20 1.284111 -0.470287 -0.172949 -0.792020
2014-03-22 1.031994 1.059394 0.037627 0.101246
2014-03-22 0.889149 0.724618 0.459405 1.023127
2014-03-23 -1.136320 -0.396265 -1.833737 1.478656
2014-03-23 -0.740400 -0.644395 -1.221330 0.321805
2014-03-23 -0.443021 -0.172013 0.020392 -2.368532
2014-03-23 1.063545 0.039607 1.673722 1.707222
2014-03-24 0.865192 -0.036810 -1.162648 0.947431
2014-03-24 -1.671451 0.979238 -0.701093 -1.204192
2014-03-26 -1.903534 -1.550349 0.267547 -0.585541
2014-03-27 2.515671 -0.271228 -1.993744 -0.671797
2014-03-27 1.728133 -0.423410 -0.620908 1.430503
2014-03-28 -1.446037 -0.229452 -0.996486 0.120554
2014-03-28 -0.664443 -0.665207 0.512771 0.066071
2014-03-29 -1.093379 -0.936449 -0.930999 0.389743
2014-03-29 1.205712 -0.356070 -0.595944 0.702238
2014-03-29 -1.069506 0.358093 1.217409 -2.286798
2014-03-29 2.441311 1.391739 -0.838139 0.226026
2014-03-31 1.471447 -0.987615 0.201999 1.228070
2014-03-31 -0.050524 0.539846 0.133359 -0.833252
In the end, what I am looking for is an overlay of of plots (one per column), where each of them looks as follows (note that different values of CI get different values of alphas):
最后,我要寻找的是图的叠加(每列一个),其中每个图如下所示(请注意,不同的 CI 值会获得不同的 alpha 值):
采纳答案by mwaskom
I don't think tsplot
is going to work with the data you have. The assumptions it makes about the input data are that you've sampled the same units at each timepoint (although you can have missing timepoints for some units).
我认为不会tsplot
使用您拥有的数据。它对输入数据所做的假设是您在每个时间点采样了相同的单位(尽管您可能会丢失某些单位的时间点)。
For example, say you measured blood pressure from the same people every day for a month, and then you wanted to plot the average blood pressure by condition (where maybe the "condition" variable is the diet they are on). tsplot
could do this, with a call that would look something like sns.tsplot(df, time="day", unit="person", condition="diet", value="blood_pressure")
例如,假设您在一个月内每天测量同一个人的血压,然后您想按条件绘制平均血压(其中“条件”变量可能是他们的饮食)。tsplot
可以做到这一点,调用看起来像sns.tsplot(df, time="day", unit="person", condition="diet", value="blood_pressure")
That scenario is different from having large groups of people on different diets and each day randomly sampling some from each group and measuring their blood pressure. From the example you gave, it seems like your data are structured like the this.
这种情况不同于让一大群人吃不同的饮食,每天从每组中随机抽取一些样本并测量他们的血压。从你给出的例子来看,你的数据似乎是这样结构的。
However, it's not that hard to come up with a mix of matplotlib and pandas that will do what I think you want:
然而,想出一个 matplotlib 和 pandas 的组合来做我认为你想做的事情并不难:
# Read in the data from the stackoverflow question
df = pd.read_clipboard().iloc[1:]
# Convert it to "long-form" or "tidy" representation
df = pd.melt(df, id_vars=["date"], var_name="condition")
# Plot the average value by condition and date
ax = df.groupby(["condition", "date"]).mean().unstack("condition").plot()
# Get a reference to the x-points corresponding to the dates and the the colors
x = np.arange(len(df.date.unique()))
palette = sns.color_palette()
# Calculate the 25th and 75th percentiles of the data
# and plot a translucent band between them
for cond, cond_df in df.groupby("condition"):
low = cond_df.groupby("date").value.apply(np.percentile, 25)
high = cond_df.groupby("date").value.apply(np.percentile, 75)
ax.fill_between(x, low, high, alpha=.2, color=palette.pop(0))
This code produces:
此代码产生: