pandas 按时间计算 DataFrame 的 EWMA
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17181143/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
computing an EWMA of a DataFrame by time
提问by Drew
I have this dataframe:
我有这个数据框:
avg date high low qty
0 16.92 2013-05-27 00:00:00 19.00 1.22 71151.00
1 14.84 2013-05-30 00:00:00 19.00 1.22 42939.00
2 9.19 2013-06-02 00:00:00 17.20 1.23 5607.00
3 23.63 2013-06-05 00:00:00 5000.00 1.22 5850.00
4 13.82 2013-06-10 00:00:00 19.36 1.22 5644.00
5 17.76 2013-06-15 00:00:00 24.00 2.02 16969.00
Each row is an observation of avg, high, low, and qty that was created on the specified date.
每行都是在指定日期创建的平均、最高、最低和数量的观察值。
I'm trying to compute an exponential moving weighted average with a span of 60 days:
我正在尝试计算跨度为 60 天的指数移动加权平均值:
df["emwa"] = pandas.ewma(df["avg"],span=60,freq="D")
But I get
但我得到
TypeError: Only valid with DatetimeIndex or PeriodIndex
Okay, so maybe I need to add a DateTimeIndex to my DataFrame when it's constructed. Let me change my constructor call from
好的,所以也许我需要在构造 DataFrame 时向它添加一个 DateTimeIndex。让我改变我的构造函数调用
df = pandas.DataFrame(records) #records is just a list of dictionaries
to
到
rng = pandas.date_range(firstDate,lastDate, freq='D')
df = pandas.DataFrame(records,index=rng)
But now I get
但现在我得到
ValueError: Shape of passed values is (5,), indices imply (5, 1641601)
Any suggestions for how to compute my EMWA?
关于如何计算我的 EMWA 的任何建议?
回答by Andy Hayden
You need two things, ensure the date column is of dates (rather of strings) and to set the index to these dates.
You can do this in one go using to_datetime:
您需要做两件事,确保日期列是日期(而不是字符串)并将索引设置为这些日期。
您可以使用to_datetime以下方法一次性完成此操作:
In [11]: df.index = pd.to_datetime(df.pop('date'))
In [12]: df
Out[12]:
avg high low qty
date
2013-05-27 16.92 19.00 1.22 71151
2013-05-30 14.84 19.00 1.22 42939
2013-06-02 9.19 17.20 1.23 5607
2013-06-05 23.63 5000.00 1.22 5850
2013-06-10 13.82 19.36 1.22 5644
2013-06-15 17.76 24.00 2.02 16969
Then you can call emwaas expected:
然后你可以emwa按预期调用:
In [13]: pd.ewma(df["avg"], span=60, freq="D")
Out[13]:
date
2013-05-27 16.920000
2013-05-28 16.920000
2013-05-29 16.920000
2013-05-30 15.862667
2013-05-31 15.862667
2013-06-01 15.862667
2013-06-02 13.563899
2013-06-03 13.563899
2013-06-04 13.563899
2013-06-05 16.207625
2013-06-06 16.207625
2013-06-07 16.207625
2013-06-08 16.207625
2013-06-09 16.207625
2013-06-10 15.697743
2013-06-11 15.697743
2013-06-12 15.697743
2013-06-13 15.697743
2013-06-14 15.697743
2013-06-15 16.070721
Freq: D, dtype: float64
and if you set this as a column:
如果您将其设置为一列:
In [14]: df['ewma'] = pd.ewma(df["avg"], span=60, freq="D")
In [15]: df
Out[15]:
avg high low qty ewma
date
2013-05-27 16.92 19.00 1.22 71151 16.920000
2013-05-30 14.84 19.00 1.22 42939 15.862667
2013-06-02 9.19 17.20 1.23 5607 13.563899
2013-06-05 23.63 5000.00 1.22 5850 16.207625
2013-06-10 13.82 19.36 1.22 5644 15.697743
2013-06-15 17.76 24.00 2.02 16969 16.070721
回答by chjortlund
In Pandas >0.17 ewma has been depricated. Same functionality can be obtained by combining ewm()and mean()
在Pandas 中 >0.17 ewma 已被弃用。可以通过组合ewm()和获得相同的功能mean()
Like:
喜欢:
# Calculating a few means (averages) with exponential components (com = center of mass)
# on the closing price of the Deutsche Bank stock.
import requests
import zipfile
import io # Python 2, use StringIO
import pandas as pd
import matplotlib
# Set the number of columns to be displayed when printing DataFrames
pd.set_option('max_columns', 7)
# Download file from ipfs
ipfs_file_url = "https://ipfs.io/ipfs/QmW7aSLjePW7S8uE5zbAneGAPdrzdA3MpFkTiFPrRsKS8t"
response = requests.get(ipfs_file_url, stream=True)
# The file is a zipfile to let's read it and parse the csv inside
zf = zipfile.ZipFile(io.BytesIO(response.content)) # Python 2, use StringIO.StringIO
df = pd.read_csv(zf.open('DB_20170627_to_20180627.csv'))
# Oookay, let's begin!
print(df)
# New DataFrame to keep it clean
output = pd.DataFrame()
output['Date'] = df['Date']
output['ewma_com10'] = df['Close'].ewm(com=10).mean()
output['ewma_com50'] = df['Close'].ewm(com=50).mean()
output['ewma_com100'] = df['Close'].ewm(com=100).mean()
print(output)
output.index = pd.to_datetime(output['Date'], format='%Y-%m-%d')
output.plot()
Jupyter Notebook can be found here: pandas_exponential_average.ipynb
Jupyter Notebook 可以在这里找到:pandas_exponential_average.ipynb


