Python 构建多回归模型抛出错误:`Pandas 数据转换为对象的 numpy dtype。使用 np.asarray(data) 检查输入数据。`

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33833832/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:04:40  来源:igfitidea点击:

Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`

pythonnumpypandasstatsmodels

提问by Sanoj

I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like:

我有一些分类预测变量(即变量)为 0 和 1 的 Pandas 数据框,以及一些数字变量。当我将其拟合到 stasmodel 时,例如:

est = sm.OLS(y, X).fit()

It throws:

它抛出:

Pandas data cast to numpy dtype of object. Check input data with np.asarray(data). 

I converted all the dtypes of the DataFrame using df.convert_objects(convert_numeric=True)

我使用 df.convert_objects(convert_numeric=True)

After this all dtypes of dataframe variables appear as int32 or int64. But at the end it still shows dtype: object, like this:

在此之后,数据帧变量的所有 dtypes 都显示为 int32 或 int64。但最后它仍然显示dtype: object,如下所示:

4516        int32
4523        int32
4525        int32
4531        int32
4533        int32
4542        int32
4562        int32
sex         int64
race        int64
dispstd     int64
age_days    int64
dtype: object

Here 4516, 4523 are variable labels.

这里 4516、4523 是变量标签。

Any idea? I need to build a multi-regression model on more than hundreds of variables. For that I have concatenated 3 pandas DataFrames to come up with final DataFrame to be used in model building.

任何的想法?我需要在数百个变量上构建多回归模型。为此,我连接了 3 个 Pandas DataFrames 以得出用于模型构建的最终 DataFrame。

回答by Daniel Gibson

If X is your dataframe, try using the .astypemethod to convert to float when running the model:

如果 X 是您的数据框,请尝试使用该.astype方法在运行模型时转换为浮点数:

est = sm.OLS(y, X.astype(float)).fit()

回答by kratant adhaulia

if both y(dependent) and X are taken from a data frame then type cast both:-

如果 y(dependent) 和 X 都是从数据框中获取的,则对两者进行类型转换:-

est = sm.OLS(y.astype(float), X.astype(float)).fit()

回答by Mário de Sá Vera

This is because you have NOT generated the dummy values step to all predictors so how can the regression take place over literals ? that is what the error message is saying it is trying to convert to numpy valid entries.

这是因为您还没有为所有预测变量生成虚拟值步骤,所以回归如何发生在文字上?这就是错误消息所说的它正在尝试转换为 numpy 有效条目。

Just go back to your pipeline and include the dummies properly.

只需返回您的管道并正确包含假人即可。