Python 熊猫滚动应用自定义

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40954560/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:15:37  来源:igfitidea点击:

Pandas Rolling Apply custom

pythonpandasapply

提问by Bobe Kryant

I have been following a similar answer here, but I have some questions when using sklearn and rolling apply. I am trying to create z-scores and do PCA with rolling apply, but I keep on getting 'only length-1 arrays can be converted to Python scalars' error.

在这里一直在关注类似的答案,但是在使用 sklearn 和滚动应用时我有一些问题。我正在尝试创建 z-scores 并通过滚动应用执行 PCA,但我不断获得'only length-1 arrays can be converted to Python scalars' error.

Following the previous example I create a dataframe

按照上一个示例,我创建了一个数据框

from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
sc=StandardScaler() 
tmp=pd.DataFrame(np.random.randn(2000,2)/10000,index=pd.date_range('2001-01-01',periods=2000),columns=['A','B'])

If I use the rollingcommand:

如果我使用rolling命令:

 tmp.rolling(window=5,center=False).apply(lambda x: sc.fit_transform(x))
 TypeError: only length-1 arrays can be converted to Python scalars

I get this error. I can however create functions with mean and standard deviations with no problem.

我收到这个错误。但是,我可以毫无问题地创建具有均值和标准差的函数。

def test(df):
    return np.mean(df)
tmp.rolling(window=5,center=False).apply(lambda x: test(x))

I believe the error occurs when I am trying to subtract the mean by the current values for z-score.

我相信当我试图用 z 分数的当前值减去平均值时会发生错误。

def test2(df):
    return df-np.mean(df)
tmp.rolling(window=5,center=False).apply(lambda x: test2(x))
only length-1 arrays can be converted to Python scalars

How can I create custom rolling functions with sklearn to first standardize and then run PCA?

如何使用 sklearn 创建自定义滚动函数以首先标准化然后运行 ​​PCA?

EDIT: I realize my question was not exactly clear so I shall try again. I want to standardize my values and then run PCA to get the amount of variance explained by each factor. Doing this without rolling is fairly straightforward.

编辑:我意识到我的问题不是很清楚,所以我会再试一次。我想标准化我的值,然后运行 ​​PCA 以获得每个因素解释的方差量。在不滚动的情况下执行此操作非常简单。

testing=sc.fit_transform(tmp)
pca=decomposition.pca.PCA() #run pca
pca.fit(testing) 
pca.explained_variance_ratio_
array([ 0.50967441,  0.49032559])

I cannot use this same procedure when rolling. Using the rolling zscore function from @piRSquared gives the zscores. It seems that PCA from sklearn is incompatible with the rolling apply custom function. (In fact I think this is the case with most sklearn modules.) I am just trying to get the explained variance which is a one dimensional item, but this code below returns a bunch of NaNs.

滚动时我不能使用相同的程序。使用来自@piRSquared 的滚动 zscore 函数给出 zscores。来自 sklearn 的 PCA 似乎与滚动应用自定义功能不兼容。(事实上​​,我认为大多数 sklearn 模块都是这种情况。)我只是想获得解释方差,这是一个一维项目,但下面的代码返回了一堆 NaN。

def test3(df):
    pca.fit(df)
    return pca.explained_variance_ratio_
tmp.rolling(window=5,center=False).apply(lambda x: test3(x))

However, I can create my own explained variance function, but this also does not work.

但是,我可以创建自己的解释方差函数,但这也不起作用。

def test4(df):
    cov_mat=np.cov(df.T) #need covariance of features, not observations
    eigen_vals,eigen_vecs=np.linalg.eig(cov_mat)
    tot=sum(eigen_vals)
    var_exp=[(i/tot) for i in sorted(eigen_vals,reverse=True)]
    return var_exp
tmp.rolling(window=5,center=False).apply(lambda x: test4(x))

I get this error 0-dimensional array given. Array must be at least two-dimensional.

我收到此错误0-dimensional array given. Array must be at least two-dimensional

To recap, I would like to run rolling z-scores and then rolling pca outputting the explained variance at each roll. I have the rolling z-scores down but not explained variance.

回顾一下,我想运行滚动 z 分数,然后滚动 pca,在每次滚动时输出解释的方差。我有滚动的 z 分数,但没有解释方差。

回答by piRSquared

As @BrenBarn commented, the rolling function needs to reduce a vector to a single number. The following is equivalent to what you were trying to do and help's highlight the problem.

正如@BrenBarn 评论的那样,滚动函数需要将向量减少为单个数字。以下相当于您尝试执行的操作并帮助突出显示问题。

zscore = lambda x: (x - x.mean()) / x.std()
tmp.rolling(5).apply(zscore)
TypeError: only length-1 arrays can be converted to Python scalars
TypeError: only length-1 arrays can be converted to Python scalars

In the zscorefunction, x.mean()reduces, x.std()reduces, but xis an array. Thus the entire thing is an array.

zscore函数中,x.mean()reduce,x.std()reduce,不过x是一个数组。因此整个事物是一个数组。



The way around this is to perform the roll on the parts of the z-score calculation that require it, and not on the parts that cause the problem.

解决此问题的方法是对需要它的 z 分数计算部分执行滚动,而不是对导致问题的部分执行滚动。

(tmp - tmp.rolling(5).mean()) / tmp.rolling(5).std()

enter image description here

在此处输入图片说明