使用 Boost 计算 C++ 中样本向量的均值和标准差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7616511/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Calculate mean and standard deviation from a vector of samples in C++ using Boost
提问by user393144
采纳答案by David Nehme
回答by musiphil
I don't know if Boost has more specific functions, but you can do it with the standard library.
不知道Boost有没有更具体的功能,但是你可以用标准库来做。
Given std::vector<double> v
, this is the naive way:
鉴于std::vector<double> v
,这是天真的方式:
#include <numeric>
double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();
double sq_sum = std::inner_product(v.begin(), v.end(), v.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size() - mean * mean);
This is susceptible to overflow or underflow for huge or tiny values. A slightly better way to calculate the standard deviation is:
对于巨大或微小的值,这很容易上溢或下溢。计算标准偏差的一个稍微好一点的方法是:
double sum = std::accumulate(v.begin(), v.end(), 0.0);
double mean = sum / v.size();
std::vector<double> diff(v.size());
std::transform(v.begin(), v.end(), diff.begin(),
std::bind2nd(std::minus<double>(), mean));
double sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
double stdev = std::sqrt(sq_sum / v.size());
UPDATEfor C++11:
C++11更新:
The call to std::transform
can be written using a lambda function instead of std::minus
and std::bind2nd
(now deprecated):
std::transform
可以使用 lambda 函数代替std::minus
and std::bind2nd
(现已弃用)编写对 的调用:
std::transform(v.begin(), v.end(), diff.begin(), [mean](double x) { return x - mean; });
回答by Josh Greifer
If performance is important to you, and your compiler supports lambdas, the stdev calculation can be made faster and simpler: In tests with VS 2012 I've found that the following code is over 10 X quicker than the Boost code given in the chosen answer; it's also 5 X quicker than the safer version of the answer using standard libraries given by musiphil.
如果性能对您很重要,并且您的编译器支持 lambda,则可以更快更简单地进行 stdev 计算:在 VS 2012 测试中,我发现以下代码比所选答案中给出的 Boost 代码快 10 倍以上; 它也比使用 musiphil 提供的标准库的更安全版本的答案快 5 倍。
Note I'm using sample standard deviation, so the below code gives slightly different results (Why there is a Minus One in Standard Deviations)
注意我使用的是样本标准偏差,所以下面的代码给出的结果略有不同(为什么标准偏差中有一个减号)
double sum = std::accumulate(std::begin(v), std::end(v), 0.0);
double m = sum / v.size();
double accum = 0.0;
std::for_each (std::begin(v), std::end(v), [&](const double d) {
accum += (d - m) * (d - m);
});
double stdev = sqrt(accum / (v.size()-1));
回答by codeling
Improving on the answer by musiphil, you can write a standard deviation function without the temporary vector diff
, just using a single inner_product
call with the C++11 lambda capabilities:
改进musiphil 的答案,您可以编写没有临时 vector 的标准偏差函数diff
,只需使用inner_product
具有 C++11 lambda 功能的单个调用:
double stddev(std::vector<double> const & func)
{
double mean = std::accumulate(func.begin(), func.end(), 0.0) / func.size();
double sq_sum = std::inner_product(func.begin(), func.end(), func.begin(), 0.0,
[](double const & x, double const & y) { return x + y; },
[mean](double const & x, double const & y) { return (x - mean)*(y - mean); });
return std::sqrt(sq_sum / ( func.size() - 1 ));
}
I suspect doing the subtraction multiple times is cheaper than using up additional intermediate storage, and I think it is more readable, but I haven't tested the performance yet.
我怀疑多次减法比使用额外的中间存储更便宜,我认为它更具可读性,但我还没有测试过性能。
回答by galactica
It seems the following elegant recursive solution has not been mentioned, although it has been around for a long time. Referring to Knuth's Art of Computer Programming,
似乎没有提到以下优雅的递归解决方案,尽管它已经存在很长时间了。参考 Knuth 的计算机编程艺术,
mean_1 = x_1, variance_1 = 0; //initial conditions; edge case;
//for k >= 2,
mean_k = mean_k-1 + (x_k - mean_k-1) / k;
variance_k = variance_k-1 + (x_k - mean_k-1) * (x_k - mean_k);
then for a list of n>=2
values, the estimate of the standard deviation is:
那么对于一个n>=2
值列表,标准偏差的估计是:
stddev = std::sqrt(variance_n / (n-1)).
Hope this helps!
希望这可以帮助!
回答by HelloWorld
My answer is similar as Josh Greifer but generalised to sample covariance. Sample variance is just sample covariance but with the two inputs identical. This includes Bessel's correlation.
我的答案与 Josh Greifer 相似,但推广到样本协方差。样本方差只是样本协方差,但两个输入相同。这包括贝塞尔相关性。
template <class Iter> typename Iter::value_type cov(const Iter &x, const Iter &y)
{
double sum_x = std::accumulate(std::begin(x), std::end(x), 0.0);
double sum_y = std::accumulate(std::begin(y), std::end(y), 0.0);
double mx = sum_x / x.size();
double my = sum_y / y.size();
double accum = 0.0;
for (auto i = 0; i < x.size(); i++)
{
accum += (x.at(i) - mx) * (y.at(i) - my);
}
return accum / (x.size() - 1);
}
回答by slyy2048
2x faster than the versions before mentioned - mostly because transform() and inner_product() loops are joined. Sorry about my shortcut/typedefs/macro: Flo = float. CR const ref. VFlo - vector. Tested in VS2010
比之前提到的版本快 2 倍 - 主要是因为 transform() 和 inner_product() 循环被加入。抱歉我的快捷方式/typedefs/宏:Flo = float。CR 常量参考。VFlo - 向量。在 VS2010 中测试
#define fe(EL, CONTAINER) for each (auto EL in CONTAINER) //VS2010
Flo stdDev(VFlo CR crVec) {
SZ n = crVec.size(); if (n < 2) return 0.0f;
Flo fSqSum = 0.0f, fSum = 0.0f;
fe(f, crVec) fSqSum += f * f; // EDIT: was Cit(VFlo, crVec) {
fe(f, crVec) fSum += f;
Flo fSumSq = fSum * fSum;
Flo fSumSqDivN = fSumSq / n;
Flo fSubSqSum = fSqSum - fSumSqDivN;
Flo fPreSqrt = fSubSqSum / (n - 1);
return sqrt(fPreSqrt);
}
回答by Sushant Kondguli
Create your own container:
创建自己的容器:
template <class T>
class statList : public std::list<T>
{
public:
statList() : std::list<T>::list() {}
~statList() {}
T mean() {
return accumulate(begin(),end(),0.0)/size();
}
T stddev() {
T diff_sum = 0;
T m = mean();
for(iterator it= begin(); it != end(); ++it)
diff_sum += ((*it - m)*(*it -m));
return diff_sum/size();
}
};
It does have some limitations, but it works beautifully when you know what you are doing.
它确实有一些限制,但是当您知道自己在做什么时,它的效果会很好。
回答by ali
//means deviation in c++
//表示c++中的偏差
/A deviation that is a difference between an observed value and the true value of a quantity of interest (such as a population mean) is an error and a deviation that is the difference between the observed value and an estimate of the true value (such an estimate may be a sample mean) is a residual. These concepts are applicable for data at the interval and ratio levels of measurement./
/偏差是观测值与感兴趣量的真实值(例如总体平均值)之间的差异是误差,而偏差是观察值与真实值的估计值之间的差异(例如估计值可能是样本均值)是残差。这些概念适用于测量区间和比率级别的数据。/
#include <iostream>
#include <conio.h>
using namespace std;
/* run this program using the console pauser or add your own getch, system("pause") or input loop */
int main(int argc, char** argv)
{
int i,cnt;
cout<<"please inter count:\t";
cin>>cnt;
float *num=new float [cnt];
float *s=new float [cnt];
float sum=0,ave,M,M_D;
for(i=0;i<cnt;i++)
{
cin>>num[i];
sum+=num[i];
}
ave=sum/cnt;
for(i=0;i<cnt;i++)
{
s[i]=ave-num[i];
if(s[i]<0)
{
s[i]=s[i]*(-1);
}
cout<<"\n|ave - number| = "<<s[i];
M+=s[i];
}
M_D=M/cnt;
cout<<"\n\n Average: "<<ave;
cout<<"\n M.D(Mean Deviation): "<<M_D;
getch();
return 0;
}
}