Java 向量化梯度下降算法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20736460/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 03:57:08  来源:igfitidea点击:

Vectorizing a gradient descent algorithm

javapythonmatlabmachine-learninglinear-algebra

提问by bigTree

I am coding gradient descent in matlab. For two features, I get for the update step:

我在 matlab 中编码梯度下降。对于两个功能,我得到了更新步骤:

temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y).*X(:,1));
temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2));
theta(1,1) = temp0;
theta(2,1) = temp1;
temp0 = theta(1,1) - (alpha/m)*sum((X*theta-y).*X(:,1));
temp1 = theta(2,1) - (alpha/m)*sum((X*theta-y).*X(:,2));
theta(1,1) = temp0;
theta(2,1) = temp1;

However, I want to vectorize this code and to be able to apply it to any number of features. For the vectorization part, it showsthat what I am trying to do is a matrix multiplication

但是,我想矢量化此代码并能够将其应用于任意数量的功能。对于矢量化部分,它表明我想要做的是矩阵乘法

theta = theta - (alpha/m) * (X' * (X*theta-y));
theta = theta - (alpha/m) * (X' * (X*theta-y));

This is well seen, but when I tried, I realized that it doesn't work for gradient descent because the parameters are not updated simultaneously.

这很好看,但是当我尝试时,我意识到它不适用于梯度下降,因为参数不是同时更新的。

Then, how can I vectorize this code and make sure the parameters and updated at the same time?

那么,如何矢量化这段代码并确保参数和更新同时进行?

采纳答案by lennon310

Your vectorization is correct. I also tried both of your code, and it got me the same theta. Just remember don't use your updated theta in your second implementation.

您的矢量化是正确的。我也尝试了你的两个代码,它得到了相同的 theta。请记住不要在第二个实现中使用更新的 theta。

This also works but less simplified than your 2nd implementation:

这也有效,但不如您的第二个实现简单:

Error = X * theta - y;
for i = 1:2
    S(i) = sum(Error.*X(:,i));
end

theta = theta - alpha * (1/m) * S'

回答by iatanasov

In order to update them simultaneously you need to keep the value of theta(1..n) in temporary vector and after the operation just update values in original theta vector.

为了同时更新它们,您需要将 theta(1..n) 的值保留在临时向量中,并且在操作之后只需更新原始 theta 向量中的值。

This is the code, that I use for this purpose:

这是我用于此目的的代码:

Temp update

临时更新

tempChange = zeros(length(theta), 1);

tempChange = zeros(length(theta), 1);

tempChage = theta - (alpha/m) * (X' * (X*theta-y));

tempChage = theta - (alpha/m) * (X' * (X*theta-y));

Actual update

实际更新

theta = tempChage;

theta = tempChage;

回答by S.Arora

For the vectorized version try the following(two steps to make simultaneous update explicitly) :

对于矢量化版本,请尝试以下操作(明确进行同步更新的两个步骤):

 gradient = (alpha/m) * X' * (X*theta -y)
 theta = theta - gradient

回答by Madhusudhan B A

theta = theta - (alpha/m) * (X') * ((X*theta)-y)

回答by Abhishek Jain

I am very new to this topic, still my opinion is: if you compute X*thetabefore hand then while doing vectorized operation to adjust theta, need not to be in temp. in other words: if you compute X*thetawhile updating theta vector, theta(1) updates before theta(2) and hence changes the X*theta. but if we compute X*thetaas y_pred and then do vectorize op on theta, it will be ok.

我对这个话题很陌生,我的意见仍然是:如果你X*theta事先计算,那么在进行矢量化操作来调整 theta 时,不需要处于临时状态。换句话说:如果您X*theta在更新 theta 向量时进行计算,则 theta(1) 在 theta(2) 之前更新,因此会更改X*theta. 但是如果我们计算X*theta为 y_pred 然后对 theta 进行矢量化操作,那就没问题了。

so my suggestion is(without using temp):

所以我的建议是(不使用温度):

y_pred = X*theta %theta is [1;1] and X is mX2 matrix
theta = theta - (alpha/m) * (X' * (y_pred-y));

Please correct me if I am wrong.

如果我错了,请纠正我。

回答by Martin Moltke Wozniak

Here is the vectorized form of gradient descent it works for me in octave.
remember that X is a matrix with ones in the first column (since theta_0 *1is thetha_0). For each column in X you have a feature(n) in X. Each row is a training set(m). so X a m X (n+1 ) matrix. The y column vector could be the house prices. Its good to have a cost function to check if you find a minimum.
choose a value for alpha maybe a = 0.001 and try changing it for each time you run the code. The num_itersis the times you want it to run.

这是梯度下降的矢量化形式,它在八度音阶中对我有用。
请记住,X 是一个矩阵,第一列中有一个(因为theta_0 *1thetha_0)。对于 X 中的每一列,X 中有一个特征(n)。每一行是一个训练集(m)。所以 X am X (n+1 ) 矩阵。y 列向量可能是房价。有一个成本函数来检查你是否找到最小值是很好的。
为 alpha 选择一个值,可能是 a = 0.001,并在每次运行代码时尝试更改它。这num_iters是您希望它运行的时间。

function theta = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples


 for iter = 1:num_iters

  theta = theta - (alpha/m) * (X') * ((X*theta)-y)


 end

end

see the full explanation here: https://www.coursera.org/learn/machine-learning/resources/QQx8l

在此处查看完整说明:https: //www.coursera.org/learn/machine-learning/resources/QQx8l