使用 python 和 numpy 的梯度下降
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17784587/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
gradient descent using python and numpy
提问by Madan Ram
def gradient(X_norm,y,theta,alpha,m,n,num_it):
temp=np.array(np.zeros_like(theta,float))
for i in range(0,num_it):
h=np.dot(X_norm,theta)
#temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) )
temp[0]=theta[0]-(alpha/m)*(np.sum(h-y))
temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1]))
theta=temp
return theta
X_norm,mean,std=featureScale(X)
#length of X (number of rows)
m=len(X)
X_norm=np.array([np.ones(m),X_norm])
n,m=np.shape(X_norm)
num_it=1500
alpha=0.01
theta=np.zeros(n,float)[:,np.newaxis]
X_norm=X_norm.transpose()
theta=gradient(X_norm,y,theta,alpha,m,n,num_it)
print theta
My theta from the above code is 100.2 100.2
, but it should be 100.2 61.09
in matlab which is correct.
我上面代码中的 theta 是100.2 100.2
,但它应该100.2 61.09
在 matlab 中,这是正确的。
采纳答案by Thomas Jungblut
I think your code is a bit too complicated and it needs more structure, because otherwise you'll be lost in all equations and operations. In the end this regression boils down to four operations:
我认为你的代码有点太复杂了,它需要更多的结构,否则你会迷失在所有的方程和操作中。最后,这个回归归结为四个操作:
- Calculate the hypothesis h = X * theta
- Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
- Calculate the gradient = X' * loss / m
- Update the parameters theta = theta - alpha * gradient
- 计算假设 h = X * theta
- 计算损失 = h - y 以及成本的平方 (loss^2)/2m
- 计算梯度 = X' * loss / m
- 更新参数 theta = theta - alpha * gradient
In your case, I guess you have confused m
with n
. Here m
denotes the number of examples in your training set, not the number of features.
在你的情况下,我猜你已经混淆m
了n
. 这里m
表示训练集中的示例数量,而不是特征数量。
Let's have a look at my variation of your code:
让我们来看看我的代码变体:
import numpy as np
import random
# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
# avg gradient per example
gradient = np.dot(xTrans, loss) / m
# update
theta = theta - alpha * gradient
return theta
def genData(numPoints, bias, variance):
x = np.zeros(shape=(numPoints, 2))
y = np.zeros(shape=numPoints)
# basically a straight line
for i in range(0, numPoints):
# bias feature
x[i][0] = 1
x[i][1] = i
# our target variable
y[i] = (i + bias) + random.uniform(0, 1) * variance
return x, y
# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
m, n = np.shape(x)
numIterations= 100000
alpha = 0.0005
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
At first I create a small random dataset which should look like this:
首先我创建一个小的随机数据集,它应该是这样的:
As you can see I also added the generated regression line and formula that was calculated by excel.
如您所见,我还添加了生成的回归线和由 excel 计算的公式。
You need to take care about the intuition of the regression using gradient descent. As you do a complete batch pass over your data X, you need to reduce the m-losses of every example to a single weight update. In this case, this is the average of the sum over the gradients, thus the division by m
.
您需要注意使用梯度下降进行回归的直觉。当您对数据 X 进行完整的批量传递时,您需要将每个示例的 m-loss 减少到单个权重更新。在这种情况下,这是梯度总和的平均值,因此除以m
。
The next thing you need to take care about is to track the convergence and adjust the learning rate. For that matter you should always track your cost every iteration, maybe even plot it.
接下来需要注意的是跟踪收敛并调整学习率。就此而言,您应该始终跟踪每次迭代的成本,甚至可以绘制它。
If you run my example, the theta returned will look like this:
如果您运行我的示例,返回的 theta 将如下所示:
Iteration 99997 | Cost: 47883.706462
Iteration 99998 | Cost: 47883.706462
Iteration 99999 | Cost: 47883.706462
[ 29.25567368 1.01108458]
Which is actually quite close to the equation that was calculated by excel (y = x + 30). Note that as we passed the bias into the first column, the first theta value denotes the bias weight.
这实际上非常接近由 excel (y = x + 30) 计算的方程。请注意,当我们将偏差传递到第一列时,第一个 theta 值表示偏差权重。
回答by Muatik
Below you can find my implementation of gradient descent for linear regression problem.
您可以在下面找到我对线性回归问题的梯度下降实现。
At first, you calculate gradient like X.T * (X * w - y) / N
and update your current theta with this gradient simultaneously.
首先,你计算梯度,X.T * (X * w - y) / N
并同时用这个梯度更新你当前的 theta。
- X: feature matrix
- y: target values
- w: weights/values
- N: size of training set
- X:特征矩阵
- y:目标值
- w:权重/值
- N:训练集的大小
Here is the python code:
这是python代码:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import random
def generateSample(N, variance=100):
X = np.matrix(range(N)).T + 1
Y = np.matrix([random.random() * variance + i * 10 + 900 for i in range(len(X))]).T
return X, Y
def fitModel_gradient(x, y):
N = len(x)
w = np.zeros((x.shape[1], 1))
eta = 0.0001
maxIteration = 100000
for i in range(maxIteration):
error = x * w - y
gradient = x.T * error / N
w = w - eta * gradient
return w
def plotModel(x, y, w):
plt.plot(x[:,1], y, "x")
plt.plot(x[:,1], x * w, "r-")
plt.show()
def test(N, variance, modelFunction):
X, Y = generateSample(N, variance)
X = np.hstack([np.matrix(np.ones(len(X))).T, X])
w = modelFunction(X, Y)
plotModel(X, Y, w)
test(50, 600, fitModel_gradient)
test(50, 1000, fitModel_gradient)
test(100, 200, fitModel_gradient)
回答by Nico Coallier
I know this question already have been answer but I have made some update to the GD function :
我知道这个问题已经有人回答了,但我对 GD 功能做了一些更新:
### COST FUNCTION
def cost(theta,X,y):
### Evaluate half MSE (Mean square error)
m = len(y)
error = np.dot(X,theta) - y
J = np.sum(error ** 2)/(2*m)
return J
cost(theta,X,y)
def GD(X,y,theta,alpha):
cost_histo = [0]
theta_histo = [0]
# an arbitrary gradient, to pass the initial while() check
delta = [np.repeat(1,len(X))]
# Initial theta
old_cost = cost(theta,X,y)
while (np.max(np.abs(delta)) > 1e-6):
error = np.dot(X,theta) - y
delta = np.dot(np.transpose(X),error)/len(y)
trial_theta = theta - alpha * delta
trial_cost = cost(trial_theta,X,y)
while (trial_cost >= old_cost):
trial_theta = (theta +trial_theta)/2
trial_cost = cost(trial_theta,X,y)
cost_histo = cost_histo + trial_cost
theta_histo = theta_histo + trial_theta
old_cost = trial_cost
theta = trial_theta
Intercept = theta[0]
Slope = theta[1]
return [Intercept,Slope]
res = GD(X,y,theta,alpha)
This function reduce the alpha over the iteration making the function too converge faster see Estimating linear regression with Gradient Descent (Steepest Descent)for an example in R. I apply the same logic but in Python.
此函数在迭代过程中减少 alpha,使函数收敛速度更快,请参阅使用梯度下降(最陡下降)估算线性回归以获取 R 中的示例。我应用了相同的逻辑,但在 Python 中。
回答by Fernando Gabrieli
Following @thomas-jungblut implementation in python, i did the same for Octave. If you find something wrong please let me know and i will fix+update.
遵循 python 中的 @thomas-jungblut 实现,我对 Octave 做了同样的事情。如果您发现有问题,请告诉我,我会修复+更新。
Data comes from a txt file with the following rows:
数据来自具有以下行的 txt 文件:
1 10 1000
2 20 2500
3 25 3500
4 40 5500
5 60 6200
think about it as a very rough sample for features [number of bedrooms] [mts2] and last column [rent price] which is what we want to predict.
将其视为特征 [卧室数量] [mts2] 和最后一列 [租金价格] 的一个非常粗略的样本,这是我们想要预测的。
Here is the Octave implementation:
这是 Octave 的实现:
%
% Linear Regression with multiple variables
%
% Alpha for learning curve
alphaNum = 0.0005;
% Number of features
n = 2;
% Number of iterations for Gradient Descent algorithm
iterations = 10000
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% No need to update after here
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
DATA = load('CHANGE_WITH_DATA_FILE_PATH');
% Initial theta values
theta = ones(n + 1, 1);
% Number of training samples
m = length(DATA(:, 1));
% X with one mor column (x0 filled with '1's)
X = ones(m, 1);
for i = 1:n
X = [X, DATA(:,i)];
endfor
% Expected data must go always in the last column
y = DATA(:, n + 1)
function gradientDescent(x, y, theta, alphaNum, iterations)
iterations = [];
costs = [];
m = length(y);
for iteration = 1:10000
hypothesis = x * theta;
loss = hypothesis - y;
% J(theta)
cost = sum(loss.^2) / (2 * m);
% Save for the graphic to see if the algorithm did work
iterations = [iterations, iteration];
costs = [costs, cost];
gradient = (x' * loss) / m; % /m is for the average
theta = theta - (alphaNum * gradient);
endfor
% Show final theta values
display(theta)
% Show J(theta) graphic evolution to check it worked, tendency must be zero
plot(iterations, costs);
endfunction
% Execute gradient descent
gradientDescent(X, y, theta, alphaNum, iterations);