pandas sklearn:发现样本数量不一致的输入变量:[1, 99]
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45697427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
sklearn: Found input variables with inconsistent numbers of samples: [1, 99]
提问by sheldonzy
I'm trying to build a simple regression line with pandas in spyder. After executing the following code, I got this error:
我正在尝试用 spyder 中的Pandas构建一个简单的回归线。执行以下代码后,我收到此错误:
Found input variables with inconsistent numbers of samples: [1, 99]
the code:
编码:
import numpy as np
import pandas as pd
dataset = pd.read_csv('Phil.csv')
x = dataset.iloc[:, 0].values
y = dataset.iloc[:, 2].values
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x, y)
I think I know what is the problem, but I'm not quite sure how to deal with the syntax. In the variable explorer, the size of x (and y) is (99L,), and from what I remember it can't be a vector, and it must be size (99,1). same thing for y.
我想我知道问题出在哪里,但我不太确定如何处理语法。在变量资源管理器中,x(和y)的大小是(99L,),据我所知,它不能是向量,必须是大小(99,1)。y 也一样。
Saw a bunch of related topics, but none of them helped. Thanks.
看到一堆相关的话题,但没有一个有帮助。谢谢。
回答by Peter Mularien
Referring to the sklearn documentation for LinearRegression
(http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit), the X
vector needs to conform to the specification [n_samples,n_features]
.
参考LinearRegression
( http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit)的 sklearn 文档,X
向量需要符合规范[n_samples,n_features]
。
Since you have only a single feature with many samples, the shape should be (99,1) - e.g., a single value per "row" with a single "column".
由于您只有一个包含多个样本的特征,因此形状应该是 (99,1) - 例如,每个“行”有一个值,只有一个“列”。
There are many ways to accomplish this (ref: Efficient way to add a singleton dimension to a NumPy vector so that slice assignments work), in your case, the following should work:
有很多方法可以实现这一点(参考:向 NumPy 向量添加单例维度以便切片分配工作的有效方法),在您的情况下,以下应该起作用:
regressor.fit(x[:, None], y)
Don't forget that predict
requires the same shape to the data!
不要忘记,predict
需要与数据相同的形状!