在 scipy/pandas 中使用 Pearson 的 r 删除“nan”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38894488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:47:53  来源:igfitidea点击:

Dropping 'nan' with Pearson's r in scipy/pandas

pandasscipynanpearson

提问by Lodore66

Quick question: Is there a way to use 'dropna' with the Pearson's r function in scipy? I'm using it in conjunction with pandas, and some of my data has holes in it. I know you used to be able suppress 'nan' with Spearman's r in older versions of scipy, but that functionality is now missing.

快速提问:有没有办法在 scipy 中使用带有 Pearson r 函数的“dropna”?我将它与 Pandas 结合使用,我的一些数据中有漏洞。我知道您曾经可以在旧版本的 scipy 中使用 Spearman 的 r 抑制“nan” ,但是现在缺少该功能。

To my mind, this seems like a disimprovement, so I wonder if I'm missing something obvious.

在我看来,这似乎是一种进步,所以我想知道我是否遗漏了一些明显的东西。

My code:

我的代码:

for i in range(len(frame3.columns)):    
    correlation.append(sp.pearsonr(frame3.iloc[ :,i], control['CONTROL']))

回答by Ami Tavory

You can use np.isnanlike this:

你可以这样使用np.isnan

for i in range(len(frame3.columns)):    
    x, y = frame3.iloc[ :,i].values, control['CONTROL'].values
    nas = np.logical_or(x.isnan(), y.isnan())
    corr = sp.pearsonr(x[~nas], y[~nas])
    correlation.append(corr)

回答by Daniel Gibson

You can also try creating temporary dataframe, and used pandas built-in method for computing pearson correlation, or use the .dropna method in the temporary dataframe to drup null values before using sp.pearsonr

您也可以尝试创建临时数据框,并使用pandas内置方法计算皮尔逊相关,或者在使用sp.pearsonr之前使用临时数据框中的.dropna方法删除空值

for col in frame3.columns:    
     correlation.append(frame3[col].to_frame(name='3').join(control['CONTROL']).corr()['3']['CONTROL'])