pandas 按时间计算 DataFrame 的 EWMA

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17181143/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:55:46  来源:igfitidea点击:

computing an EWMA of a DataFrame by time

pythonpandas

提问by Drew

I have this dataframe:

我有这个数据框:

    avg                date    high  low      qty
0 16.92 2013-05-27 00:00:00   19.00 1.22 71151.00
1 14.84 2013-05-30 00:00:00   19.00 1.22 42939.00
2  9.19 2013-06-02 00:00:00   17.20 1.23  5607.00
3 23.63 2013-06-05 00:00:00 5000.00 1.22  5850.00
4 13.82 2013-06-10 00:00:00   19.36 1.22  5644.00
5 17.76 2013-06-15 00:00:00   24.00 2.02 16969.00

Each row is an observation of avg, high, low, and qty that was created on the specified date.

每行都是在指定日期创建的平均、最高、最低和数量的观察值。

I'm trying to compute an exponential moving weighted average with a span of 60 days:

我正在尝试计算跨度为 60 天的指数移动加权平均值:

df["emwa"] = pandas.ewma(df["avg"],span=60,freq="D")

But I get

但我得到

TypeError: Only valid with DatetimeIndex or PeriodIndex

Okay, so maybe I need to add a DateTimeIndex to my DataFrame when it's constructed. Let me change my constructor call from

好的,所以也许我需要在构造 DataFrame 时向它添加一个 DateTimeIndex。让我改变我的构造函数调用

df = pandas.DataFrame(records) #records is just a list of dictionaries

to

rng = pandas.date_range(firstDate,lastDate, freq='D')
df = pandas.DataFrame(records,index=rng)

But now I get

但现在我得到

ValueError: Shape of passed values is (5,), indices imply (5, 1641601)

Any suggestions for how to compute my EMWA?

关于如何计算我的 EMWA 的任何建议?

回答by Andy Hayden

You need two things, ensure the date column is of dates (rather of strings) and to set the index to these dates.
You can do this in one go using to_datetime:

您需要做两件事,确保日期列是日期(而不是字符串)并将索引设置为这些日期。
您可以使用to_datetime以下方法一次性完成此操作:

In [11]: df.index = pd.to_datetime(df.pop('date'))

In [12]: df
Out[12]:
              avg     high   low    qty
date
2013-05-27  16.92    19.00  1.22  71151
2013-05-30  14.84    19.00  1.22  42939
2013-06-02   9.19    17.20  1.23   5607
2013-06-05  23.63  5000.00  1.22   5850
2013-06-10  13.82    19.36  1.22   5644
2013-06-15  17.76    24.00  2.02  16969

Then you can call emwaas expected:

然后你可以emwa按预期调用:

In [13]: pd.ewma(df["avg"], span=60, freq="D")
Out[13]:
date
2013-05-27    16.920000
2013-05-28    16.920000
2013-05-29    16.920000
2013-05-30    15.862667
2013-05-31    15.862667
2013-06-01    15.862667
2013-06-02    13.563899
2013-06-03    13.563899
2013-06-04    13.563899
2013-06-05    16.207625
2013-06-06    16.207625
2013-06-07    16.207625
2013-06-08    16.207625
2013-06-09    16.207625
2013-06-10    15.697743
2013-06-11    15.697743
2013-06-12    15.697743
2013-06-13    15.697743
2013-06-14    15.697743
2013-06-15    16.070721
Freq: D, dtype: float64

and if you set this as a column:

如果您将其设置为一列:

In [14]: df['ewma'] = pd.ewma(df["avg"], span=60, freq="D")

In [15]: df
Out[15]:
              avg     high   low    qty       ewma
date
2013-05-27  16.92    19.00  1.22  71151  16.920000
2013-05-30  14.84    19.00  1.22  42939  15.862667
2013-06-02   9.19    17.20  1.23   5607  13.563899
2013-06-05  23.63  5000.00  1.22   5850  16.207625
2013-06-10  13.82    19.36  1.22   5644  15.697743
2013-06-15  17.76    24.00  2.02  16969  16.070721

回答by chjortlund

In Pandas >0.17 ewma has been depricated. Same functionality can be obtained by combining ewm()and mean()

Pandas 中 >0.17 ewma 已被弃用。可以通过组合ewm()和获得相同的功能mean()

Like:

喜欢:

# Calculating a few means (averages) with exponential components (com = center of mass) 
# on the closing price of the Deutsche Bank stock.

import requests
import zipfile
import io # Python 2, use StringIO
import pandas as pd
import matplotlib

# Set the number of columns to be displayed when printing DataFrames
pd.set_option('max_columns', 7)

# Download file from ipfs
ipfs_file_url = "https://ipfs.io/ipfs/QmW7aSLjePW7S8uE5zbAneGAPdrzdA3MpFkTiFPrRsKS8t"
response = requests.get(ipfs_file_url, stream=True)

# The file is a zipfile to let's read it and parse the csv inside
zf = zipfile.ZipFile(io.BytesIO(response.content)) # Python 2, use StringIO.StringIO
df = pd.read_csv(zf.open('DB_20170627_to_20180627.csv'))

# Oookay, let's begin!
print(df)

# New DataFrame to keep it clean
output = pd.DataFrame()
output['Date'] = df['Date']
output['ewma_com10'] = df['Close'].ewm(com=10).mean()
output['ewma_com50'] = df['Close'].ewm(com=50).mean()
output['ewma_com100'] = df['Close'].ewm(com=100).mean()
print(output)

output.index = pd.to_datetime(output['Date'], format='%Y-%m-%d')
output.plot()

enter image description here

在此处输入图片说明

Jupyter Notebook can be found here: pandas_exponential_average.ipynb

Jupyter Notebook 可以在这里找到:pandas_exponential_average.ipynb