使用 Pandas 的每小时日期时间直方图

Question

提问by Dror

Assume I have a timestamp column of datetimein a pandas.DataFrame. For the sake of example, the timestamp is in seconds resolution. I would like to bucket / bin the events in 10 minutes [1] buckets / bins. I understand that I can represent the datetimeas an integer timestamp and then use histogram. Is there a simpler approach? Something built in into pandas?

假设我有一个时间戳列datetime的pandas.DataFrame。例如，时间戳以秒为单位。我想在 10 分钟内对事件进行存储桶/垃圾桶 [1] 存储桶/垃圾桶。我知道我可以将表示datetime为整数时间戳，然后使用直方图。有没有更简单的方法？内置的东西pandas？

[1] 10 minutes is only an example. Ultimately, I would like to use different resolutions.

[1] 10 分钟只是一个例子。最终，我想使用不同的分辨率。

Answer 1

回答by Romain

To use custom frequency like "10Min" you have to use a TimeGrouper-- as suggested by @johnchase -- that operates on the index.

要使用诸如“10Min”之类的自定义频率，您必须使用TimeGrouper- 正如@johnchase 所建议的那样 - 在index.

# Generating a sample of 10000 timestamps and selecting 500 to randomize them
df = pd.DataFrame(np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = 10000, freq='S'), 500),  columns=['date'])
# Setting the date as the index since the TimeGrouper works on Index, the date column is not dropped to be able to count
df.set_index('date', drop=False, inplace=True)
# Getting the histogram
df.groupby(pd.TimeGrouper(freq='10Min')).count().plot(kind='bar')

Using `to_period`

使用 `to_period`

It is also possible to use the to_periodmethod but it does not work -- as far as I know -- with custom period like "10Min". This example take an additional column to simulate the category of an item.

也可以使用该to_period方法，但它不起作用 - 据我所知 - 自定义时间段如“10分钟”。本示例采用额外的列来模拟项目的类别。

# The number of sample
nb_sample = 500
# Generating a sample and selecting a subset to randomize them
df = pd.DataFrame({'date': np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = nb_sample*30, freq='S'), nb_sample),
                  'type': np.random.choice(['foo','bar','xxx'],nb_sample)})

# Grouping per hour and type
df = df.groupby([df['date'].dt.to_period('H'), 'type']).count().unstack()
# Droping unnecessary column level
df.columns = df.columns.droplevel()
df.plot(kind='bar')

使用 Pandas 的每小时日期时间直方图

提问by Dror

回答by Romain

Using `to_period`

使用 `to_period`

相关推荐

最近更新

标签

使用 Pandas 的每小时日期时间直方图

提问by Dror

回答by Romain

Using to_period

使用 to_period

相关推荐

pandas Python 将类方法应用于数据框的行

pandas 在熊猫中将多行连接到一行

pandas Python - Statsmodels.tsa.seasonal_decompose - 数据帧头部和尾部的缺失值

用 Pandas DataFrame 替换 mysql 数据库表中的行

相关推荐

最近更新

标签

Using `to_period`

使用 `to_period`