Pandas Dataframe AttributeError: 'DataFrame' 对象没有属性 'design_info'

Question

提问by Michael

I am trying to use the predict()function of the statsmodels.formula.apiOLS implementation. When I pass a new data frame to the function to get predicted values for an out-of-sample dataset result.predict(newdf)returns the following error: 'DataFrame' object has no attribute 'design_info'. What does this mean and how do I fix it? The full traceback is:

我正在尝试使用OLS 实现的predict()功能statsmodels.formula.api。当我将新数据框传递给函数以获取样本外数据集的预测值时，result.predict(newdf)返回以下错误：'DataFrame' object has no attribute 'design_info'. 这是什么意思，我该如何解决？完整的回溯是：

    p = result.predict(newdf)
  File "C:\Python27\lib\site-packages\statsmodels\base\model.py", line 878, in predict
    exog = dmatrix(self.model.data.orig_exog.design_info.builder,
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2088, in __getattr__
    (type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'design_info'

EDIT:Here is a reproducible example. The error appears to occur when I pickle and then unpickle the result object (which I need to do in my actual project):

编辑：这是一个可重现的例子。当我腌制然后取消腌制结果对象（我需要在我的实际项目中这样做）时，错误似乎发生了：

import cPickle
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm

df = pd.DataFrame({"A": [10,20,30,324,2353], "B": [20, 30, 10, 1, 2332], "C": [0, -30, 120, 11, 2]})

result = sm.ols(formula="A ~ B + C", data=df).fit()
print result.summary()

test1 = result.predict(df) #works

f_myfile = open('resultobject', "wb")
cPickle.dump(result, f_myfile, 2)
f_myfile.close()
print("Result Object Saved")


f_myfile = open('resultobject', "rb")
model = cPickle.load(f_myfile)

test2 = model.predict(df) #produces error

Answer 1

回答by Josef

Pickling and unpickling of a pandas DataFrame doesn't save and restore attributes that have been attached by a user, as far as I know.

据我所知，pandas DataFrame 的酸洗和解压不会保存和恢复用户附加的属性。

Since the formula information is currently stored together with the DataFrame of the original design matrix, this information is lost after unpickling a Results and Model instance.

由于公式信息当前与原始设计矩阵的 DataFrame 存储在一起，因此在解压结果和模型实例后，此信息将丢失。

If you don't use categorical variables and transformations, then the correct designmatrix can be built with patsy.dmatrix. I think the following should work

如果您不使用分类变量和转换，则可以使用 patsy.dmatrix 构建正确的 designmatrix。我认为以下应该有效

x = patsy.dmatrix("B + C", data=df)  # df is data for prediction
test2 = model.predict(x, transform=False)

or constructing the design matrix for the prediction directly should also work Note we need to explicitly add a constant that the formula adds by default.

或直接为预测构建设计矩阵也应该有效注意我们需要明确添加公式默认添加的常量。

from statsmodels.api import add_constant
test2 = model.predict(add_constant(df[["B", "C"]]), transform=False)

If the formula and design matrix contain (stateful) transformation and categorical variables, then it's not possible to conveniently construct the design matrix without the original formula information. Constructing it by hand and doing all the calculations explicitly is difficult in this case, and looses all the advantages of using formulas.

如果公式和设计矩阵包含（有状态的）转换和分类变量，则在没有原始公式信息的情况下无法方便地构建设计矩阵。在这种情况下，手动构建它并明确地进行所有计算是很困难的，并且失去了使用公式的所有优势。

The only real solution is to pickle the formula information design_infoindependently of the dataframe orig_exog.

唯一真正的解决方案是design_info独立于 dataframe腌制公式信息orig_exog。

Pandas Dataframe AttributeError: 'DataFrame' 对象没有属性 'design_info'

提问by Michael

回答by Josef

相关推荐

最近更新

标签

Pandas Dataframe AttributeError: 'DataFrame' 对象没有属性 'design_info'

提问by Michael

回答by Josef

相关推荐

日期时间相关值的 Python Numpy 或 Pandas 线性插值

从 pandas.DataFrame 的每一列中获取最大的值

获取“pandas.DataFrame”中列数最大的前 3 行？

pandas 来自excel命名范围的熊猫数据框

相关推荐

最近更新

标签