Python 构建多回归模型抛出错误:`Pandas 数据转换为对象的 numpy dtype。使用 np.asarray(data) 检查输入数据。`
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33833832/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`
提问by Sanoj
I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like:
我有一些分类预测变量(即变量)为 0 和 1 的 Pandas 数据框,以及一些数字变量。当我将其拟合到 stasmodel 时,例如:
est = sm.OLS(y, X).fit()
It throws:
它抛出:
Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
I converted all the dtypes of the DataFrame using df.convert_objects(convert_numeric=True)
我使用 df.convert_objects(convert_numeric=True)
After this all dtypes of dataframe variables appear as int32 or int64. But at the end it still shows dtype: object
, like this:
在此之后,数据帧变量的所有 dtypes 都显示为 int32 或 int64。但最后它仍然显示dtype: object
,如下所示:
4516 int32
4523 int32
4525 int32
4531 int32
4533 int32
4542 int32
4562 int32
sex int64
race int64
dispstd int64
age_days int64
dtype: object
Here 4516, 4523 are variable labels.
这里 4516、4523 是变量标签。
Any idea? I need to build a multi-regression model on more than hundreds of variables. For that I have concatenated 3 pandas DataFrames to come up with final DataFrame to be used in model building.
任何的想法?我需要在数百个变量上构建多回归模型。为此,我连接了 3 个 Pandas DataFrames 以得出用于模型构建的最终 DataFrame。
回答by Daniel Gibson
If X is your dataframe, try using the .astype
method to convert to float when running the model:
如果 X 是您的数据框,请尝试使用该.astype
方法在运行模型时转换为浮点数:
est = sm.OLS(y, X.astype(float)).fit()
回答by kratant adhaulia
if both y(dependent) and X are taken from a data frame then type cast both:-
如果 y(dependent) 和 X 都是从数据框中获取的,则对两者进行类型转换:-
est = sm.OLS(y.astype(float), X.astype(float)).fit()
回答by Mário de Sá Vera
This is because you have NOT generated the dummy values step to all predictors so how can the regression take place over literals ? that is what the error message is saying it is trying to convert to numpy valid entries.
这是因为您还没有为所有预测变量生成虚拟值步骤,所以回归如何发生在文字上?这就是错误消息所说的它正在尝试转换为 numpy 有效条目。
Just go back to your pipeline and include the dummies properly.
只需返回您的管道并正确包含假人即可。