使用 numpy 或 pandas 的时间序列

Question

提问by user1913171

I'm a beginner of Python related environment and I have problem with using time series data.

我是 Python 相关环境的初学者，在使用时间序列数据时遇到问题。

The below is my OHLC 1 minute data.

以下是我的 OHLC 1 分钟数据。

2011-11-01,9:00:00,248.50,248.95,248.20,248.70
2011-11-01,9:01:00,248.70,249.00,248.65,248.85
2011-11-01,9:02:00,248.90,249.25,248.70,249.15
...
2011-11-01,15:03:00,250.25,250.30,250.05,250.15
2011-11-01,15:04:00,250.15,250.60,250.10,250.60
2011-11-01,15:15:00,250.55,250.55,250.55,250.55
2011-11-02,9:00:00,245.55,246.25,245.40,245.80
2011-11-02,9:01:00,245.85,246.40,245.75,246.35
2011-11-02,9:02:00,246.30,246.45,245.75,245.80
2011-11-02,9:03:00,245.75,245.85,245.30,245.35
...

I'd like to extract the last "CLOSE" data per each row and convert data format like the following:

2011-11-01, 248.70, 248.85, 249.15, ... 250.15, 250.60, 250.55
2011-11-02, 245.80, 246.35, 245.80, ...
...

I'd like to calculate the highest Close value and it's time(minute) per EACH DAY like the following:
```
2011-11-01, 10:23:03, 250.55
2011-11-02, 11:02:36, 251.00
....
```

我想提取每一行的最后一个“关闭”数据并转换如下数据格式：

2011-11-01, 248.70, 248.85, 249.15, ... 250.15, 250.60, 250.55
2011-11-02, 245.80, 246.35, 245.80, ...
...

我想计算最高收盘价，它是每天的时间（分钟），如下所示：
```
2011-11-01, 10:23:03, 250.55
2011-11-02, 11:02:36, 251.00
....
```

Any help would be very appreciated.

任何帮助将不胜感激。

Thank you in advance,

先感谢您，

Answer 1

回答by Viktor Kerkez

You can use the pandas library. In the case of your data you can get the max as:

您可以使用Pandas库。对于您的数据，您可以获得最大值：

import pandas as pd
# Read in the data and parse the first two columns as a
# date-time and set it as index
df = pd.read_csv('your_file', parse_dates=[[0,1]], index_col=0, header=None)
# get only the fifth column (close)
df = df[[5]]
# Resample to date frequency and get the max value for each day.
df.resample('D', how='max')

If you want to show also the times, keep them in your DataFrame as a column and pass a function that will determine the max close value and return that row:

如果您还想显示时间，请将它们作为列保留在您的 DataFrame 中，并传递一个函数来确定最大收盘价并返回该行：

>>> df = pd.read_csv('your_file', parse_dates=[[0,1]], index_col=0, header=None,
                     usecols=[0, 1, 5], names=['d', 't', 'close'])
>>> df['time'] = df.index
>>> df.resample('D', how=lambda group: group.iloc[group['close'].argmax()])
             close                time
d_t                             
2011-11-01  250.60 2011-11-01 15:04:00
2011-11-02  246.35 2011-11-02 09:01:00

And if you wan't a list of the prices per day then just do a groupby per day and return the list of all the prices from every group using the applyon the grouped object:

如果您不需要每天的价格列表，那么只需每天进行一个 groupby 并使用apply分组对象上的返回每个组的所有价格列表：

>>> df.groupby(lambda dt: dt.date()).apply(lambda group: list(group['close']))
2011-11-01    [248.7, 248.85, 249.15, 250.15, 250.6, 250.55]
2011-11-02                    [245.8, 246.35, 245.8, 245.35]

For more information take a look at the docs: Time Series

有关更多信息，请查看文档：时间序列

Update for the concrete data set:

更新具体数据集：

The problem with your data set is that you have some days without any data, so the function passed in as the resampler should handle those cases:

你的数据集的问题是你有几天没有任何数据，所以作为重采样器传入的函数应该处理这些情况：

def func(group):
    if len(group) == 0:
        return None
    return group.iloc[group['close'].argmax()]
df.resample('D', how=func).dropna()

使用 numpy 或 pandas 的时间序列

提问by user1913171

回答by Viktor Kerkez

相关推荐

最近更新

标签

使用 numpy 或 pandas 的时间序列

提问by user1913171

回答by Viktor Kerkez

相关推荐

pandas 更改数据框索引值，同时保持其他列数据不变

pandas ipython 熊猫图不显示

pandas 当“索引长度不匹配”时，将索引从 DataFrame 复制到第二帧

pandas 如何使用每个离散值创建条形图/直方图？

相关推荐

最近更新

标签