pandas 在 Python 中绘制直方图的时间序列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17050202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:54:08  来源:igfitidea点击:

Plot timeseries of histograms in Python

pythonmatplotlibpandashistogram

提问by abudis

I'm trying to plot a time-series of histograms in Python. There has been a similar question about this, but in R. So, basically, I need the same thing, but I'm really bad in R. There are usually 48 values per day in my dataset. Where - 9999 represents missing data. Here'sthe sample of the data.

我正在尝试在 Python 中绘制直方图的时间序列。有一个类似的问题,但在 R 中。所以,基本上,我需要同样的东西,但我在 R 方面真的很糟糕。我的数据集中每天通常有 48 个值。其中 - 9999 表示缺失数据。 这是数据的示例。

I started with reading in the data and constructing a pandasDataFrame.

我首先读入数据并构建一个pandasDataFrame.

import pandas as pd
df = pd.read_csv('sample.csv', parse_dates=True, index_col=0, na_values='-9999') 
print df

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 336 entries, 2008-07-25 14:00:00 to 2008-08-01 13:30:00
Data columns (total 1 columns):
159.487691046    330  non-null values
dtypes: float64(1)

Now I can group the data by day:

现在我可以按天对数据进行分组:

daily = df.groupby(lambda x: x.date())

But then I'm stuck. I don't know how to use this with matplotlibto get my timeseries of histograms. Any help appreciated, not necessarily using pandas.

但后来我被卡住了。我不知道如何使用它matplotlib来获取我的直方图时间序列。任何帮助表示赞赏,不一定使用pandas.

采纳答案by Dan Allan

Make a histogram and use matplotlib's pcolor.

制作直方图并使用 matplotlib 的pcolor.

We need to bin the groups uniformly, so we make bins manually based on the range of your sample data.

我们需要对组进行统一分箱,因此我们会根据您的样本数据范围手动进行分箱。

In [26]: bins = np.linspace(0, 360, 10)

Apply histogramto each group.

适用histogram于每个组。

In [27]: f = lambda x: Series(np.histogram(x, bins=bins)[0], index=bins[:-1])

In [28]: df1 = daily.apply(f)

In [29]: df1
Out[29]: 
            0    40   80   120  160  200  240  280  320
2008-07-25    0    0    0    3   18    0    0    0    0
2008-07-26    2    0    0    0   17    6   13    1    8
2008-07-27    4    3   10    0    0    0    0    0   31
2008-07-28    0    7   15    0    0    0    0    6   20
2008-07-29    0    0    0    0    0    0   20   26    0
2008-07-30   10    1    0    0    0    0    1   25    9
2008-07-31   30    4    1    0    0    0    0    0   12
2008-08-01    0    0    0    0    0    0    0   14   14

Following your linked example in R, the horizontal axis should be dates, and the vertical axis should be the range of bins. The histogram values are a "heat map."

按照 R 中的链接示例,横轴应为日期,纵轴应为 bin 的范围。直方图值是一个“热图”。

In [30]: pcolor(df1.T)
Out[30]: <matplotlib.collections.PolyCollection at 0xbb60e2c>

enter image description here

在此处输入图片说明

It remains to label the axes. This answershould be of some help.

剩下来标记轴。这个答案应该会有所帮助。