C# 如何确定一组值的标准偏差 (stddev)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/895929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I determine the standard deviation (stddev) of a set of values?
提问by dead and bloated
I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..
我需要知道一个数字与一组数字相比是否在平均值的 1 个标准偏差之外,等等。
采纳答案by Jaime
While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...
虽然平方和算法在大多数情况下工作正常,但如果您处理非常大的数字,它可能会造成很大的麻烦。你基本上可能会得到一个负方差......
Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.
另外,永远不要将 a^2 计算为 pow(a,2),a * a 几乎肯定更快。
By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:
到目前为止,计算标准偏差的最佳方法是Welford 方法。我的 C 很生疏,但它可能看起来像:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 1;
foreach (double value in valueList)
{
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
k++;
}
return Math.Sqrt(S / (k-2));
}
If you have the wholepopulation (as opposed to a samplepopulation), then use return Math.Sqrt(S / (k-1));
.
如果您拥有整个总体(而不是样本总体),则使用return Math.Sqrt(S / (k-1));
.
EDIT:I've updated the code according to Jason's remarks...
编辑:我已经根据杰森的评论更新了代码......
EDIT:I've also updated the code according to Alex's remarks...
编辑:我还根据亚历克斯的评论更新了代码......
回答by dmckee --- ex-moderator kitten
You can avoid making two passes over the data by accumulating the mean and mean-square
您可以通过累加均值和均方来避免对数据进行两次传递
cnt = 0
mean = 0
meansqr = 0
loop over array
cnt++
mean += value
meansqr += value*value
mean /= cnt
meansqr /= cnt
and forming
并形成
sigma = sqrt(meansqr - mean^2)
A factor of cnt/(cnt-1)
is often appropriate as well.
一个因素cnt/(cnt-1)
通常也是合适的。
BTW-- The first pass over the data in Demiand McWafflestixanswers are hidden in the calls to Average
. That kind of thing is certainly trivial on a small list, but if the list exceed the size of the cache, or even the working set, this gets to be a bid deal.
顺便说一句- 第一次传递Demi和McWafflestix答案中的数据隐藏在对Average
. 这种事情在一个小列表上当然是微不足道的,但是如果列表超过了缓存的大小,甚至超过了工作集的大小,这就会成为一个出价交易。
回答by Demi
Code snippet:
代码片段:
public static double StandardDeviation(List<double> valueList)
{
if (valueList.Count < 2) return 0.0;
double sumOfSquares = 0.0;
double average = valueList.Average(); //.NET 3.0
foreach (double value in valueList)
{
sumOfSquares += Math.Pow((value - average), 2);
}
return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
回答by oleksii
/// <summary>
/// Calculates standard deviation, same as MATLAB std(X,0) function
/// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/>
/// </summary>
/// <param name="values">enumumerable data</param>
/// <returns>Standard deviation</returns>
public static double GetStandardDeviation(this IEnumerable<double> values)
{
//validation
if (values == null)
throw new ArgumentNullException();
int lenght = values.Count();
//saves from devision by 0
if (lenght == 0 || lenght == 1)
return 0;
double sum = 0.0, sum2 = 0.0;
for (int i = 0; i < lenght; i++)
{
double item = values.ElementAt(i);
sum += item;
sum2 += item * item;
}
return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1));
}
回答by hongkonggil
I found that Rob's helpful answer didn't quite match what I was seeing using excel. To match excel, I passed the Average for valueList in to the StandardDeviation calculation.
我发现 Rob 的有用答案与我使用 excel 所看到的不太相符。为了匹配 excel,我将 valueList 的平均值传递给了 StandardDeviation 计算。
Here is my two cents... and clearly you could calculate the moving average (ma) from valueList inside the function - but I happen to have already before needing the standardDeviation.
这是我的两分钱......显然你可以从函数内部的 valueList 计算移动平均线 (ma) - 但我碰巧在需要标准偏差之前已经有了。
public double StandardDeviation(List<double> valueList, double ma)
{
double xMinusMovAvg = 0.0;
double Sigma = 0.0;
int k = valueList.Count;
foreach (double value in valueList){
xMinusMovAvg = value - ma;
Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg);
}
return Math.Sqrt(Sigma / (k - 1));
}
回答by AlexB
The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1"). Better yet, start k at 0:
Jaime 接受的答案很好,除了你需要在最后一行除以 k-2(你需要除以“number_of_elements-1”)。更好的是,从 0 开始 k:
public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 0;
foreach (double value in valueList)
{
k++;
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
}
return Math.Sqrt(S / (k-1));
}
回答by Rikin Patel
With Extension methods.
使用扩展方法。
using System;
using System.Collections.Generic;
namespace SampleApp
{
internal class Program
{
private static void Main()
{
List<double> data = new List<double> {1, 2, 3, 4, 5, 6};
double mean = data.Mean();
double variance = data.Variance();
double sd = data.StandardDeviation();
Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
public static class MyListExtensions
{
public static double Mean(this List<double> values)
{
return values.Count == 0 ? 0 : values.Mean(0, values.Count);
}
public static double Mean(this List<double> values, int start, int end)
{
double s = 0;
for (int i = start; i < end; i++)
{
s += values[i];
}
return s / (end - start);
}
public static double Variance(this List<double> values)
{
return values.Variance(values.Mean(), 0, values.Count);
}
public static double Variance(this List<double> values, double mean)
{
return values.Variance(mean, 0, values.Count);
}
public static double Variance(this List<double> values, double mean, int start, int end)
{
double variance = 0;
for (int i = start; i < end; i++)
{
variance += Math.Pow((values[i] - mean), 2);
}
int n = end - start;
if (start > 0) n -= 1;
return variance / (n);
}
public static double StandardDeviation(this List<double> values)
{
return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
}
public static double StandardDeviation(this List<double> values, int start, int end)
{
double mean = values.Mean(start, end);
double variance = values.Variance(mean, start, end);
return Math.Sqrt(variance);
}
}
}
回答by Pedro77
10 times fastersolution than Jaime's, but be awarethat, as Jaime pointed out:
比 Jaime 的解决方案快 10 倍,但请注意,正如 Jaime 指出的:
"While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very largenumbers. You basically may end up with a negative variance"
“虽然平方和算法在大多数情况下都可以正常工作,但如果您处理非常大的数字,它可能会造成很大的麻烦。基本上,您最终可能会得到负方差”
If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for you case.
如果您认为您正在处理非常大的数字或非常大量的数字,您应该使用两种方法进行计算,如果结果相等,您肯定知道您可以使用“我的”方法来处理您的情况。
public static double StandardDeviation(double[] data)
{
double stdDev = 0;
double sumAll = 0;
double sumAllQ = 0;
//Sum of x and sum of x2
for (int i = 0; i < data.Length; i++)
{
double x = data[i];
sumAll += x;
sumAllQ += x * x;
}
//Mean (not used here)
//double mean = 0;
//mean = sumAll / (double)data.Length;
//Standard deviation
stdDev = System.Math.Sqrt(
(sumAllQ -
(sumAll * sumAll) / data.Length) *
(1.0d / (data.Length - 1))
);
return stdDev;
}
回答by Chris Marisic
The Math.NET library provides this for you to of the box.
Math.NET 库为您提供了开箱即用的功能。
PM> Install-Package MathNet.Numerics
PM> 安装包 MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
See PopulationStandardDeviationfor more information.
有关更多信息,请参阅人口标准偏差。
回答by MiguelMunoz
The trouble with all the other answers is that they assume you have your data in a big array. If your data is coming in on the fly, this would be a better approach. This class works regardless of how or if you store your data. It also gives you the choice of the Waldorf method or the sum-of-squares method. Both methods work using a single pass.
所有其他答案的问题在于,他们假设您的数据存储在一个大数组中。如果您的数据是动态传入的,这将是一个更好的方法。无论您如何或是否存储数据,该类都有效。它还为您提供了 Waldorf 方法或平方和方法的选择。这两种方法都使用单次传递。
public final class StatMeasure {
private StatMeasure() {}
public interface Stats1D {
/** Add a value to the population */
void addValue(double value);
/** Get the mean of all the added values */
double getMean();
/** Get the standard deviation from a sample of the population. */
double getStDevSample();
/** Gets the standard deviation for the entire population. */
double getStDevPopulation();
}
private static class WaldorfPopulation implements Stats1D {
private double mean = 0.0;
private double sSum = 0.0;
private int count = 0;
@Override
public void addValue(double value) {
double tmpMean = mean;
double delta = value - tmpMean;
mean += delta / ++count;
sSum += delta * (value - mean);
}
@Override
public double getMean() { return mean; }
@Override
public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); }
@Override
public double getStDevPopulation() { return Math.sqrt(sSum / (count)); }
}
private static class StandardPopulation implements Stats1D {
private double sum = 0.0;
private double sumOfSquares = 0.0;
private int count = 0;
@Override
public void addValue(double value) {
sum += value;
sumOfSquares += value * value;
count++;
}
@Override
public double getMean() { return sum / count; }
@Override
public double getStDevSample() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1));
}
@Override
public double getStDevPopulation() {
return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count);
}
}
/**
* Returns a way to measure a population of data using Waldorf's method.
* This method is better if your population or values are so large that
* the sum of x-squared may overflow. It's also probably faster if you
* need to recalculate the mean and standard deviation continuously,
* for example, if you are continually updating a graphic of the data as
* it flows in.
*
* @return A Stats1D object that uses Waldorf's method.
*/
public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); }
/**
* Return a way to measure the population of data using the sum-of-squares
* method. This is probably faster than Waldorf's method, but runs the
* risk of data overflow.
*
* @return A Stats1D object that uses the sum-of-squares method
*/
public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); }
}