Python sklearn 问题:在进行回归时发现样本数量不一致的数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32097392/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
sklearn issue: Found arrays with inconsistent numbers of samples when doing regression
提问by pyman
this question seems to have been asked before, but I can't seem to comment for further clarification on the accepted answer and I couldn't figure out the solution provided.
这个问题似乎以前被问过,但我似乎无法评论以进一步澄清已接受的答案,我无法弄清楚提供的解决方案。
I am trying to learn how to use sklearn with my own data. I essentially just got the annual % change in GDP for 2 different countries over the past 100 years. I am just trying to learn using a single variable for now. What I am essentially trying to do is use sklearn to predict what the GDP % change for country A will be given the percentage change in country B's GDP.
我正在尝试学习如何将 sklearn 与我自己的数据一起使用。我基本上只是获得了过去 100 年中 2 个不同国家的 GDP 年度变化百分比。我现在只是想学习使用单个变量。我基本上想要做的是使用 sklearn 来预测 A 国的 GDP 百分比变化将与 B 国 GDP 的百分比变化有关。
The problem is that I receive an error saying:
问题是我收到一条错误消息:
ValueError: Found arrays with inconsistent numbers of samples: [ 1 107]
ValueError:发现样本数量不一致的数组:[ 1 107]
Here is my code:
这是我的代码:
import sklearn.linear_model as lm
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def bytespdate2num(fmt, encoding='utf-8'):#function to convert bytes to string for the dates.
strconverter = mdates.strpdate2num(fmt)
def bytesconverter(b):
s = b.decode(encoding)
return strconverter(s)
return bytesconverter
dataCSV = open('combined_data.csv')
comb_data = []
for line in dataCSV:
comb_data.append(line)
date, chngdpchange, ausgdpchange = np.loadtxt(comb_data, delimiter=',', unpack=True, converters={0: bytespdate2num('%d/%m/%Y')})
chntrain = chngdpchange[:-1]
chntest = chngdpchange[-1:]
austrain = ausgdpchange[:-1]
austest = ausgdpchange[-1:]
regr = lm.LinearRegression()
regr.fit(chntrain, austrain)
print('Coefficients: \n', regr.coef_)
print("Residual sum of squares: %.2f"
% np.mean((regr.predict(chntest) - austest) ** 2))
print('Variance score: %.2f' % regr.score(chntest, austest))
plt.scatter(chntest, austest, color='black')
plt.plot(chntest, regr.predict(chntest), color='blue')
plt.xticks(())
plt.yticks(())
plt.show()
What am I doing wrong? I essentially tried to apply the sklearn tutorial (They used some diabetes data set) to my own simple data. My data just contains the date, country A's % change in GDP for that specific year, and country B's % change in GDP for that same year.
我究竟做错了什么?我基本上尝试将 sklearn 教程(他们使用了一些糖尿病数据集)应用于我自己的简单数据。我的数据只包含日期,A 国在该特定年份的 GDP 变化百分比,以及 B 国在同一年的 GDP 变化百分比。
I tried the solutions hereand here (basically trying to find more out about the solution in the first link), but just receive the exact same error.
我在这里和这里尝试了解决方案(基本上是试图在第一个链接中找到有关解决方案的更多信息),但收到完全相同的错误。
Here is the full traceback in case you want to see it:
这是完整的回溯,以防您想看到它:
Traceback (most recent call last):
File "D:\My Stuff\Dropbox\Python\Python projects\test regression\tester.py", line 34, in <module>
regr.fit(chntrain, austrain)
File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\linear_model\base.py", line 376, in fit
y_numeric=True, multi_output=True)
File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\utils\validation.py", line 454, in check_X_y
check_consistent_length(X, y)
File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\utils\validation.py", line 174, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [ 1 107]
回答by IVlad
regr.fit(chntrain, austrain)
This doesn't look right. The first parameter to fit
should be an X
, which refers to a feature vector. The second parameter should be a y
, which is the correct answers (targets) vector associated with X
.
这看起来不对。to 的第一个参数fit
应该是 an X
,它指的是一个特征向量。第二个参数应该是 a y
,它是与 关联的正确答案(目标)向量X
。
For example, if you have GDP, you might have:
例如,如果你有 GDP,你可能有:
X[0] = [43, 23, 52] -> y[0] = 5
# meaning the first year had the features [43, 23, 52] (I just made them up)
# and the change that year was 5
Judging by your names, both chntrain
and austrain
are feature vectors. Judging by how you load your data, maybe the last column is the target?
从你的名字来看,chntrain
和austrain
都是特征向量。从您加载数据的方式来看,也许最后一列是目标?
Maybe you need to do something like:
也许您需要执行以下操作:
chntrain_X, chntrain_y = chntrain[:, :-1], chntrain[:, -1]
# you can do the same with austrain and concatenate them or test on them if this part works
regr.fit(chntrain_X, chntrain_y)
But we can't tell without knowing the exact storage format of your data.
但是,如果不知道您数据的确切存储格式,我们就无法判断。
回答by qg_jinn
Try changing chntrain
to a 2-D array instead of 1-D, i.e. reshape to (len(chntrain), 1)
.
尝试更改chntrain
为二维数组而不是一维数组,即重塑为(len(chntrain), 1)
。
For prediction, also change chntest
to a 2-D array.
对于预测,也更改chntest
为二维数组。
回答by Chang Men
In fit(X,y),the input parameter X is supposed to be a 2-D array. But if X in your data is only one-dimension, you can just reshape it into a 2-D array like this:regr.fit(chntrain_X.reshape(len(chntrain_X), 1), chntrain_Y)
在 fit(X,y) 中,输入参数 X 应该是一个二维数组。但是,如果数据中的 X 只是一维,则可以将其重塑为二维数组,如下所示:regr.fit(chntrain_X.reshape(len(chntrain_X), 1), chntrain_Y)
回答by bobo
I have been having similar problems to you and have found a solution.
我一直遇到与您类似的问题,并找到了解决方案。
Where you have the following error:
出现以下错误的地方:
ValueError: Found arrays with inconsistent numbers of samples: [ 1 107]
The [ 1 107] part is basically saying that your array is the wrong way around. Sklearn thinks you have 107 columns of data with 1 row.
[ 1 107] 部分基本上是说你的数组是错误的。Sklearn 认为您有 107 列数据和 1 行。
To fix this try transposing the X data like so:
要解决此问题,请尝试转置 X 数据,如下所示:
chntrain.T
The re-run your fit:
重新运行您的适合:
regr.fit(chntrain, austrain)
Depending on what your "austrain" data looks like you may need to transpose this too.
根据您的“austrain”数据的样子,您可能也需要转置它。
回答by Cloud Cho
You may use np.newaxis
as well. The example can be X = X[:, np.newaxis]
. I found the method at Logistic function
你也可以使用np.newaxis
。该示例可以是X = X[:, np.newaxis]
. 我在Logistic 函数中找到了方法