Java 中的加权线性回归
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5684282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Weighted Linear Regression in Java
提问by C. Reed
Does anyone know of a scientific/mathematical library in Java that has a straightforward implementation of weighted linear regression? Something along the lines of a function that takes 3 arguments and returns the corresponding coefficients:
有谁知道 Java 中的科学/数学库可以直接实现加权线性回归?类似于带有 3 个参数并返回相应系数的函数:
linearRegression(x,y,weights)
This seems fairly straightforward, so I imagine it exists somewhere.
这看起来相当简单,所以我想它存在于某个地方。
PS) I've tried Flannigan's library: http://www.ee.ucl.ac.uk/~mflanaga/java/Regression.html, it has the right idea but seems to crash sporadically and complain out my degrees of freedom?
PS)我试过 Flannigan 的图书馆:http://www.ee.ucl.ac.uk/~mflanaga/java/Regression.html ,它有正确的想法,但似乎偶尔崩溃并抱怨我的自由度?
回答by Aleadam
Not a library, but the code is posted: http://www.codeproject.com/KB/recipes/LinReg.aspx(and includes the mathematical explanation for the code, which is a huge plus). Also, it seems that there is another implementation of the same algorithm here: http://sin-memories.blogspot.com/2009/04/weighted-linear-regression-in-java-and.html
不是库,但代码已发布:http: //www.codeproject.com/KB/recipes/LinReg.aspx(并包括代码的数学解释,这是一个巨大的优势)。此外,这里似乎还有相同算法的另一种实现:http: //sin-memories.blogspot.com/2009/04/weighted-linear-regression-in-java-and.html
Finally, there is a lib from a University in New Zealand that seems to have it implemented: http://www.cs.waikato.ac.nz/~ml/weka/(pretty decent javadocs). The specific method is described here: http://weka.sourceforge.net/doc/weka/classifiers/functions/LinearRegression.html
最后,新西兰一所大学的库似乎实现了它:http: //www.cs.waikato.ac.nz/~ml/weka/(相当不错的 javadocs)。具体方法描述在这里:http: //weka.sourceforge.net/doc/weka/classifiers/functions/LinearRegression.html
回答by L. G.
I personally used org.apache.commons.math.stat.regression.SimpleRegression Class of the Apache Math library.
我个人使用了 Apache Math 库的 org.apache.commons.math.stat.regression.SimpleRegression 类。
I also found a more lightweight class from Princeton university but didn't test it:
我还从普林斯顿大学找到了一个更轻量级的课程,但没有测试它:
http://introcs.cs.princeton.edu/java/97data/LinearRegression.java.html
http://introcs.cs.princeton.edu/java/97data/LinearRegression.java.html
回答by Mateusz Stefek
I was also searching for this, but I couldn't find anything. The reason might be that you can simplify the problem to the standard regression as follows:
我也在寻找这个,但我找不到任何东西。原因可能是您可以将问题简化为标准回归,如下所示:
The weighted linear regression without residual can be represented as
diag(sqrt(weights))y = diag(sqrt(weights))Xb
where diag(sqrt(weights))T
basically means multiplying each row of the T matrix by a different square rooted weight. Therefore, the translation between weighted and unweighted regressions without residualis trivial.
没有残差的加权线性回归可以表示为
diag(sqrt(weights))y = diag(sqrt(weights))Xb
其中diag(sqrt(weights))T
基本上意味着将 T 矩阵的每一行乘以不同的平方根权重。因此,没有残差的加权和未加权回归之间的转换是微不足道的。
To translate a regression with residual y=Xb+u
into a regression without residual y=Xb
, you add an additional column to X - a new column with only ones.
要将有残差y=Xb+u
的回归转换为无残差的回归y=Xb
,您可以向 X 添加一个额外的列 - 一个只有一列的新列。
Now that you know how to simplify the problem, you can use any library to solve the standard linear regression.
现在您知道如何简化问题,您可以使用任何库来解决标准线性回归。
Here's an example, using Apache Commons Math:
这是一个使用 Apache Commons Math 的示例:
void linearRegression(double[] xUnweighted, double[] yUnweighted, double[] weights) {
double[] y = new double[yUnweighted.length];
double[][] x = new double[xUnweighted.length][2];
for (int i = 0; i < y.length; i++) {
y[i] = Math.sqrt(weights[i]) * yUnweighted[i];
x[i][0] = Math.sqrt(weights[i]) * xUnweighted[i];
x[i][1] = Math.sqrt(weights[i]);
}
OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
regression.setNoIntercept(true);
regression.newSampleData(y, x);
double[] regressionParameters = regression.estimateRegressionParameters();
double slope = regressionParameters[0];
double intercept = regressionParameters[1];
System.out.println("y = " + slope + "*x + " + intercept);
}
This can be explained intuitively by the fact that in linear regression with u=0, if you take any point (x,y) and convert it to (xC,yC), the error for the new point will also get multiplied by C. In other words, linear regression already applies higher weight to points with higher x. We are minimizing the squared error, that's why we extract the roots of the weights.
这可以直观地解释为,在 u=0 的线性回归中,如果取任何点 (x,y) 并将其转换为 (x C,yC),新点的误差也将乘以C. 换句话说,线性回归已经对具有更高 x 的点应用了更高的权重。我们正在最小化平方误差,这就是我们提取权重根的原因。
回答by Luke Hutchison
Here's a direct Java port of the C# code for weighted linear regression from the first link in Aleadam's answer:
这是来自 Aleadam 答案中第一个链接的加权线性回归的 C# 代码的直接 Java 端口: