pandas sklearn:发现样本数量不一致的输入变量:[1, 99]

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45697427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:15:00  来源:igfitidea点击:

sklearn: Found input variables with inconsistent numbers of samples: [1, 99]

pandaslinear-regressionspydersklearn-pandas

提问by sheldonzy

I'm trying to build a simple regression line with pandas in spyder. After executing the following code, I got this error:

我正在尝试用 spyder 中的Pandas构建一个简单的回归线。执行以下代码后,我收到此错误:

Found input variables with inconsistent numbers of samples: [1, 99]

the code:

编码:

import numpy as np
import pandas as pd

dataset = pd.read_csv('Phil.csv')

x = dataset.iloc[:, 0].values
y = dataset.iloc[:, 2].values

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x, y)

I think I know what is the problem, but I'm not quite sure how to deal with the syntax. In the variable explorer, the size of x (and y) is (99L,), and from what I remember it can't be a vector, and it must be size (99,1). same thing for y.

我想我知道问题出在哪里,但我不太确定如何处理语法。在变量资源管理器中,x(和y)的大小是(99L,),据我所知,它不能是向量,必须是大小(99,1)。y 也一样。

Saw a bunch of related topics, but none of them helped. Thanks.

看到一堆相关的话题,但没有一个有帮助。谢谢。

回答by Peter Mularien

Referring to the sklearn documentation for LinearRegression(http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit), the Xvector needs to conform to the specification [n_samples,n_features].

参考LinearRegression( http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit)的 sklearn 文档,X向量需要符合规范[n_samples,n_features]

Since you have only a single feature with many samples, the shape should be (99,1) - e.g., a single value per "row" with a single "column".

由于您只有一个包含多个样本的特征,因此形状应该是 (99,1) - 例如,每个“行”有一个值,只有一个“列”。

There are many ways to accomplish this (ref: Efficient way to add a singleton dimension to a NumPy vector so that slice assignments work), in your case, the following should work:

有很多方法可以实现这一点(参考:向 NumPy 向量添加单例维度以便切片分配工作的有效方法),在您的情况下,以下应该起作用:

regressor.fit(x[:, None], y)

Don't forget that predictrequires the same shape to the data!

不要忘记,predict需要与数据相同的形状!