使用 numpy 或 pandas 的时间序列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18788605/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:09:19  来源:igfitidea点击:

Time Series using numpy or pandas

pythonnumpypandastime-series

提问by user1913171

I'm a beginner of Python related environment and I have problem with using time series data.

我是 Python 相关环境的初学者,在使用时间序列数据时遇到问题。

The below is my OHLC 1 minute data.

以下是我的 OHLC 1 分钟数据。

2011-11-01,9:00:00,248.50,248.95,248.20,248.70
2011-11-01,9:01:00,248.70,249.00,248.65,248.85
2011-11-01,9:02:00,248.90,249.25,248.70,249.15
...
2011-11-01,15:03:00,250.25,250.30,250.05,250.15
2011-11-01,15:04:00,250.15,250.60,250.10,250.60
2011-11-01,15:15:00,250.55,250.55,250.55,250.55
2011-11-02,9:00:00,245.55,246.25,245.40,245.80
2011-11-02,9:01:00,245.85,246.40,245.75,246.35
2011-11-02,9:02:00,246.30,246.45,245.75,245.80
2011-11-02,9:03:00,245.75,245.85,245.30,245.35
...
  1. I'd like to extract the last "CLOSE" data per each row and convert data format like the following:

    2011-11-01, 248.70, 248.85, 249.15, ... 250.15, 250.60, 250.55
    2011-11-02, 245.80, 246.35, 245.80, ...
    ...
    
  2. I'd like to calculate the highest Close value and it's time(minute) per EACH DAY like the following:

    2011-11-01, 10:23:03, 250.55
    2011-11-02, 11:02:36, 251.00
    ....
    
  1. 我想提取每一行的最后一个“关闭”数据并转换如下数据格式:

    2011-11-01, 248.70, 248.85, 249.15, ... 250.15, 250.60, 250.55
    2011-11-02, 245.80, 246.35, 245.80, ...
    ...
    
  2. 我想计算最高收盘价,它是每天的时间(分钟),如下所示:

    2011-11-01, 10:23:03, 250.55
    2011-11-02, 11:02:36, 251.00
    ....
    

Any help would be very appreciated.

任何帮助将不胜感激。

Thank you in advance,

先感谢您,

回答by Viktor Kerkez

You can use the pandas library. In the case of your data you can get the max as:

您可以使用Pandas库。对于您的数据,您可以获得最大值:

import pandas as pd
# Read in the data and parse the first two columns as a
# date-time and set it as index
df = pd.read_csv('your_file', parse_dates=[[0,1]], index_col=0, header=None)
# get only the fifth column (close)
df = df[[5]]
# Resample to date frequency and get the max value for each day.
df.resample('D', how='max')

If you want to show also the times, keep them in your DataFrame as a column and pass a function that will determine the max close value and return that row:

如果您还想显示时间,请将它们作为列保留在您的 DataFrame 中,并传递一个函数来确定最大收盘价并返回该行:

>>> df = pd.read_csv('your_file', parse_dates=[[0,1]], index_col=0, header=None,
                     usecols=[0, 1, 5], names=['d', 't', 'close'])
>>> df['time'] = df.index
>>> df.resample('D', how=lambda group: group.iloc[group['close'].argmax()])
             close                time
d_t                             
2011-11-01  250.60 2011-11-01 15:04:00
2011-11-02  246.35 2011-11-02 09:01:00

And if you wan't a list of the prices per day then just do a groupby per day and return the list of all the prices from every group using the applyon the grouped object:

如果您不需要每天的价格列表,那么只需每天进行一个 groupby 并使用apply分组对象上的返回每个组的所有价格列表:

>>> df.groupby(lambda dt: dt.date()).apply(lambda group: list(group['close']))
2011-11-01    [248.7, 248.85, 249.15, 250.15, 250.6, 250.55]
2011-11-02                    [245.8, 246.35, 245.8, 245.35]

For more information take a look at the docs: Time Series

有关更多信息,请查看文档:时间序列

Update for the concrete data set:

更新具体数据集:

The problem with your data set is that you have some days without any data, so the function passed in as the resampler should handle those cases:

你的数据集的问题是你有几天没有任何数据,所以作为重采样器传入的函数应该处理这些情况:

def func(group):
    if len(group) == 0:
        return None
    return group.iloc[group['close'].argmax()]
df.resample('D', how=func).dropna()