pandas 熊猫中的时间序列箱线图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26507404/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:36:11  来源:igfitidea点击:

Time-series boxplot in pandas

pythonpandastime-seriesboxplot

提问by Fred S

How can I create a boxplot for a pandas time-series where I have a box for each day?

如何为每天有一个盒子的Pandas时间序列创建一个箱线图?

Sample dataset of hourly data where one box should consist of 24 values:

每小时数据的示例数据集,其中一个框应包含 24 个值:

import pandas as pd
n = 480
ts = pd.Series(randn(n),
               index=pd.date_range(start="2014-02-01",
                                   periods=n,
                                   freq="H"))
ts.plot()

I am aware that I could make an extra column for the day, but I would like to have proper x-axis labeling and x-limit functionality (like in ts.plot()), so being able to work with the datetime index would be great.

我知道我可以为当天多做一列,但我希望有适当的 x 轴标签和 x 限制功能(如ts.plot()),因此能够使用日期时间索引会很棒。

There is a similar question for R/ggplot2 here, if it helps to clarify what I want.

R/ggplot2 here有一个类似的问题,如果它有助于澄清我想要的。

回答by Rutger Kassies

If its an option for you, i would recommend using Seaborn, which is a wrapper for Matplotlib. You could do it yourself by looping over the groups from your timeseries, but that's much more work.

如果您可以选择它,我会推荐使用Seaborn,它是 Matplotlib 的包装器。您可以通过循环遍历时间序列中的组来自己完成,但这需要做更多的工作。

import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt

n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))


fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)

Which gives: enter image description here

这使: 在此处输入图片说明

Note that i'm passing the day of yearas the grouperto seaborn, if your data spans multiple years this wouldn't work. You could then consider something like:

请注意,我将day of yearas传递grouper给 seaborn,如果您的数据跨越多年,这将不起作用。然后你可以考虑这样的事情:

ts.index.to_series().apply(lambda x: x.strftime('%Y%m%d'))

Edit, for 3-hourly you could use this as a grouper, but it only works if there are no minutes or lower defined. :

编辑,对于 3 小时,您可以将其用作石斑鱼,但它仅在没有分钟或更低定义的情况下才有效。:

[(dt - datetime.timedelta(hours=int(dt.hour % 3))).strftime('%Y%m%d%H') for dt in ts.index]

回答by dulrich

(Not enough rep to comment on accepted solution, so adding an answer instead.)

(没有足够的代表对已接受的解决方案发表评论,因此请添加答案。)

The accepted code has two small errors: (1) need to add numpyimport and (2) nned to swap the xand yparameters in the boxplotstatement. The following produces the plot shown.

接受的代码有两个小错误:(1)需要添加numpyimport和(2)nned交换语句中的xy参数boxplot。下面生成显示的图。

import numpy as np
import pandas as pd
import seaborn
import matplotlib.pyplot as plt

n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))

fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)

回答by Jonathan

I have a solution that may be helpful-- It only uses native pandas and allows for hierarchical date-time grouping (i.e spanning years). The key is that if you pass a function to groupby(), it will be called on each element of the dataframe's index. If your index is a DatetimeIndex(or similar), you can access all of the dt's convenience functions for resampling!

我有一个可能有帮助的解决方案——它只使用本地大Pandas并允许分层日期时间分组(即跨年)。关键是,如果您将函数传递给groupby(),它将在数据帧索引的每个元素上调用。如果您的索引是 a DatetimeIndex(或类似的),您可以访问 dt 的所有便利功能以进行重采样!

Try this:

尝试这个:

n = 480
ts = pd.DataFrame(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
ts.groupby(lambda x: x.strftime("%Y-%m-%d")).boxplot(subplots=False, figsize=(12,9), rot=90)

enter image description here

在此处输入图片说明