pandas 以 5 分钟为间隔对 DataFrame 进行分组

Question

提问by Sam c21

How do I get just the 5 minute data using Python/pandas out of this csv? For every 5 minute interval I'm trying to get the DATE, TIME,OPEN, HIGH, LOW, CLOSE, VOLUME for that 5 minute interval.

如何使用 Python/pandas 从这个 csv 中获取 5 分钟的数据？对于每 5 分钟的间隔，我试图获取该 5 分钟间隔的日期、时间、打开、高、低、关闭、音量。

DATE       TIME     OPEN    HIGH    LOW     CLOSE   VOLUME
02/03/1997 09:04:00 3046.00 3048.50 3046.00 3047.50 505          
02/03/1997 09:05:00 3047.00 3048.00 3046.00 3047.00 162          
02/03/1997 09:06:00 3047.50 3048.00 3047.00 3047.50 98           
02/03/1997 09:07:00 3047.50 3047.50 3047.00 3047.50 228          
02/03/1997 09:08:00 3048.00 3048.00 3047.50 3048.00 136          
02/03/1997 09:09:00 3048.00 3048.00 3046.50 3046.50 174          
02/03/1997 09:10:00 3046.50 3046.50 3045.00 3045.00 134          
02/03/1997 09:11:00 3045.50 3046.00 3044.00 3045.00 43           
02/03/1997 09:12:00 3045.00 3045.50 3045.00 3045.00 214          
02/03/1997 09:13:00 3045.50 3045.50 3045.50 3045.50 8            
02/03/1997 09:14:00 3045.50 3046.00 3044.50 3044.50 152

Answer 1

回答by ayhan

You can use df.resampleto do aggregation based on a date/time variable. You'll need a datetime index and you can specify that while reading the csv file:

您可以使用df.resample基于日期/时间变量进行聚合。您将需要一个日期时间索引，您可以在读取 csv 文件时指定它：

df = pd.read_csv("filename.csv", parse_dates = [["DATE", "TIME"]], index_col=0)

This will result in a dataframe with an index where date and time are combined (source):

这将产生一个带有索引的数据框，其中日期和时间组合在一起（源）：

df.head()
Out[7]: 
                       OPEN    HIGH     LOW   CLOSE  VOLUME 
DATE_TIME                                                   
1997-02-03 09:04:00  3046.0  3048.5  3046.0  3047.5      505
1997-02-03 09:05:00  3047.0  3048.0  3046.0  3047.0      162
1997-02-03 09:06:00  3047.5  3048.0  3047.0  3047.5       98
1997-02-03 09:07:00  3047.5  3047.5  3047.0  3047.5      228
1997-02-03 09:08:00  3048.0  3048.0  3047.5  3048.0      136

After that you can use resample to get the sum, mean, etc. of those five minute intervals.

之后，您可以使用 resample 来获取这五分钟间隔的总和、均值等。

df.resample("5T").mean()
Out[8]: 
                       OPEN    HIGH     LOW   CLOSE  VOLUME 
DATE_TIME                                                   
1997-02-03 09:00:00  3046.0  3048.5  3046.0  3047.5    505.0
1997-02-03 09:05:00  3047.6  3047.9  3046.8  3047.3    159.6
1997-02-03 09:10:00  3045.6  3045.9  3044.8  3045.0    110.2
1997-02-03 09:15:00  3043.6  3044.0  3042.8  3043.2     69.2
1997-02-03 09:20:00  3044.7  3045.2  3044.5  3045.0     65.8
1997-02-03 09:25:00  3043.8  3044.0  3043.5  3043.7     59.0
1997-02-03 09:30:00  3044.6  3045.0  3044.3  3044.6     56.0
1997-02-03 09:35:00  3044.5  3044.5  3043.5  3044.5     44.0

(Tis used for minute frequency. Hereis a list of other units.)

（T用于分钟频率。这是其他单位的列表。）

pandas 以 5 分钟为间隔对 DataFrame 进行分组

提问by Sam c21

回答by ayhan

相关推荐

最近更新

标签

pandas 以 5 分钟为间隔对 DataFrame 进行分组

提问by Sam c21

回答by ayhan

相关推荐

pandas 从熊猫数据框列中的对象中删除逗号

pandas 大熊猫到sql server

pandas 熊猫重新采样选项

从 Pandas Dataframe 打印中删除页眉和页脚

相关推荐

最近更新

标签