python中的方差分析使用带有statsmodels或scipy的pandas数据框？

Question

提问by wolfsatthedoor

I want to use the Pandas dataframe to breakdown the variance in one variable.

我想使用 Pandas 数据框来分解一个变量中的方差。

For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I want to find out what fraction of the variation in this series is coming from cross-sectional city variation, how much is coming from time series variation, and how much is coming from night vs. day.

例如，如果我有一个名为“度数”的列，并且我为不同的日期、城市和夜晚与白天建立了索引，我想找出这个系列中变化的一部分来自横截面城市变化，有多少来自时间序列变化，有多少来自夜晚与白天。

In Stata I would use Fixed effects and look at the R^2. Hopefully my question makes sense.

在 Stata 中，我会使用固定效果并查看 R^2。希望我的问题是有道理的。

Basically, what I want to do, is find the ANOVA breakdown of "Degrees" by three other columns.

基本上，我想要做的是通过其他三个列找到“度数”的方差分析细分。

Answer 1

回答by cphlewis

I set up a direct comparison to test them, found that their assumptions can differ slightly, got a hint from a statistician, and here is an example of ANOVA on a pandas dataframe matching R's results:

我设置了一个直接比较来测试它们，发现它们的假设可能略有不同，从统计学家那里得到了一个提示，这里是一个 Pandas 数据帧上与 R 结果匹配的方差分析的例子：

import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols


# R code on R sample dataset

#> anova(with(ChickWeight, lm(weight ~ Time + Diet)))
#Analysis of Variance Table
#
#Response: weight
#           Df  Sum Sq Mean Sq  F value    Pr(>F)
#Time        1 2042344 2042344 1576.460 < 2.2e-16 ***
#Diet        3  129876   43292   33.417 < 2.2e-16 ***
#Residuals 573  742336    1296
#write.csv(file='ChickWeight.csv', x=ChickWeight, row.names=F)

cw = pd.read_csv('ChickWeight.csv')

cw_lm=ols('weight ~ Time + C(Diet)', data=cw).fit() #Specify C for Categorical
print(sm.stats.anova_lm(cw_lm, typ=2))
#                  sum_sq   df            F         PR(>F)
#C(Diet)    129876.056995    3    33.416570   6.473189e-20
#Time      2016357.148493    1  1556.400956  1.803038e-165
#Residual   742336.119560  573          NaN            NaN

python中的方差分析使用带有statsmodels或scipy的pandas数据框？

提问by wolfsatthedoor

回答by cphlewis

相关推荐

最近更新

标签

python中的方差分析使用带有statsmodels或scipy的pandas数据框？

提问by wolfsatthedoor

回答by cphlewis

相关推荐

Python 如何使用 pygame.KEYDOWN？

python的“re.compile”有什么作用？

如何从 Python 中的 HTML 页面中提取 URL

Python 如果我没有跟踪进入的所有数据点，将 y=x 添加到 matplotlib 散点图

相关推荐

最近更新

标签