pandas 按时间计算 DataFrame 的 EWMA

Question

提问by Drew

I have this dataframe:

我有这个数据框：

    avg                date    high  low      qty
0 16.92 2013-05-27 00:00:00   19.00 1.22 71151.00
1 14.84 2013-05-30 00:00:00   19.00 1.22 42939.00
2  9.19 2013-06-02 00:00:00   17.20 1.23  5607.00
3 23.63 2013-06-05 00:00:00 5000.00 1.22  5850.00
4 13.82 2013-06-10 00:00:00   19.36 1.22  5644.00
5 17.76 2013-06-15 00:00:00   24.00 2.02 16969.00

Each row is an observation of avg, high, low, and qty that was created on the specified date.

每行都是在指定日期创建的平均、最高、最低和数量的观察值。

I'm trying to compute an exponential moving weighted average with a span of 60 days:

我正在尝试计算跨度为 60 天的指数移动加权平均值：

df["emwa"] = pandas.ewma(df["avg"],span=60,freq="D")

But I get

但我得到

TypeError: Only valid with DatetimeIndex or PeriodIndex

Okay, so maybe I need to add a DateTimeIndex to my DataFrame when it's constructed. Let me change my constructor call from

好的，所以也许我需要在构造 DataFrame 时向它添加一个 DateTimeIndex。让我改变我的构造函数调用

df = pandas.DataFrame(records) #records is just a list of dictionaries

to

到

rng = pandas.date_range(firstDate,lastDate, freq='D')
df = pandas.DataFrame(records,index=rng)

But now I get

但现在我得到

ValueError: Shape of passed values is (5,), indices imply (5, 1641601)

Any suggestions for how to compute my EMWA?

关于如何计算我的 EMWA 的任何建议？

Answer 1

回答by Andy Hayden

You need two things, ensure the date column is of dates (rather of strings) and to set the index to these dates.
You can do this in one go using to_datetime:

您需要做两件事，确保日期列是日期（而不是字符串）并将索引设置为这些日期。
您可以使用to_datetime以下方法一次性完成此操作：

In [11]: df.index = pd.to_datetime(df.pop('date'))

In [12]: df
Out[12]:
              avg     high   low    qty
date
2013-05-27  16.92    19.00  1.22  71151
2013-05-30  14.84    19.00  1.22  42939
2013-06-02   9.19    17.20  1.23   5607
2013-06-05  23.63  5000.00  1.22   5850
2013-06-10  13.82    19.36  1.22   5644
2013-06-15  17.76    24.00  2.02  16969

Then you can call emwaas expected:

然后你可以emwa按预期调用：

In [13]: pd.ewma(df["avg"], span=60, freq="D")
Out[13]:
date
2013-05-27    16.920000
2013-05-28    16.920000
2013-05-29    16.920000
2013-05-30    15.862667
2013-05-31    15.862667
2013-06-01    15.862667
2013-06-02    13.563899
2013-06-03    13.563899
2013-06-04    13.563899
2013-06-05    16.207625
2013-06-06    16.207625
2013-06-07    16.207625
2013-06-08    16.207625
2013-06-09    16.207625
2013-06-10    15.697743
2013-06-11    15.697743
2013-06-12    15.697743
2013-06-13    15.697743
2013-06-14    15.697743
2013-06-15    16.070721
Freq: D, dtype: float64

and if you set this as a column:

如果您将其设置为一列：

In [14]: df['ewma'] = pd.ewma(df["avg"], span=60, freq="D")

In [15]: df
Out[15]:
              avg     high   low    qty       ewma
date
2013-05-27  16.92    19.00  1.22  71151  16.920000
2013-05-30  14.84    19.00  1.22  42939  15.862667
2013-06-02   9.19    17.20  1.23   5607  13.563899
2013-06-05  23.63  5000.00  1.22   5850  16.207625
2013-06-10  13.82    19.36  1.22   5644  15.697743
2013-06-15  17.76    24.00  2.02  16969  16.070721

Answer 2

回答by chjortlund

In Pandas >0.17 ewma has been depricated. Same functionality can be obtained by combining ewm()and mean()

在Pandas 中 >0.17 ewma 已被弃用。可以通过组合ewm()和获得相同的功能mean()

Like:

喜欢：

# Calculating a few means (averages) with exponential components (com = center of mass) 
# on the closing price of the Deutsche Bank stock.

import requests
import zipfile
import io # Python 2, use StringIO
import pandas as pd
import matplotlib

# Set the number of columns to be displayed when printing DataFrames
pd.set_option('max_columns', 7)

# Download file from ipfs
ipfs_file_url = "https://ipfs.io/ipfs/QmW7aSLjePW7S8uE5zbAneGAPdrzdA3MpFkTiFPrRsKS8t"
response = requests.get(ipfs_file_url, stream=True)

# The file is a zipfile to let's read it and parse the csv inside
zf = zipfile.ZipFile(io.BytesIO(response.content)) # Python 2, use StringIO.StringIO
df = pd.read_csv(zf.open('DB_20170627_to_20180627.csv'))

# Oookay, let's begin!
print(df)

# New DataFrame to keep it clean
output = pd.DataFrame()
output['Date'] = df['Date']
output['ewma_com10'] = df['Close'].ewm(com=10).mean()
output['ewma_com50'] = df['Close'].ewm(com=50).mean()
output['ewma_com100'] = df['Close'].ewm(com=100).mean()
print(output)

output.index = pd.to_datetime(output['Date'], format='%Y-%m-%d')
output.plot()

Jupyter Notebook can be found here: pandas_exponential_average.ipynb

Jupyter Notebook 可以在这里找到：pandas_exponential_average.ipynb

pandas 按时间计算 DataFrame 的 EWMA

提问by Drew

回答by Andy Hayden

回答by chjortlund

相关推荐

最近更新

标签

pandas 按时间计算 DataFrame 的 EWMA

提问by Drew

回答by Andy Hayden

回答by chjortlund

相关推荐

pandas 熊猫 read_csv dtype 前导零

pandas 合并熊猫中的两个时间序列

在 Pandas 数据框中获得几年内工作日某个小时的平均值

尝试将日志方法应用于 Python 中的 Pandas 数据框列时出错

相关推荐

最近更新

标签