我怎样才能在 Ruby 中做标准偏差?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7749568/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 04:22:59  来源:igfitidea点击:

How can I do standard deviation in Ruby?

rubystandard-deviation

提问by Timothy T.

I have several records with a given attribute, and I want to find the standard deviation.

我有几个具有给定属性的记录,我想找到标准偏差。

How do I do that?

我怎么做?

回答by tolitius

module Enumerable

    def sum
      self.inject(0){|accum, i| accum + i }
    end

    def mean
      self.sum/self.length.to_f
    end

    def sample_variance
      m = self.mean
      sum = self.inject(0){|accum, i| accum +(i-m)**2 }
      sum/(self.length - 1).to_f
    end

    def standard_deviation
      Math.sqrt(self.sample_variance)
    end

end 

Testing it:

测试它:

a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
a.standard_deviation  
# => 4.594682917363407


01/17/2012:

2012 年 1 月 17 日:

fixing "sample_variance" thanks to Dave Sag

感谢 Dave Sag 修复了“sample_variance”

回答by eprothro

It appears that Angela may have been wanting an existing library. After playing with statsample, array-statisics, and a few others, I'd recommend the descriptive_statisticsgem if you're trying to avoid reinventing the wheel.

看来安吉拉可能一直想要一个现有的图书馆。在玩过 statsample、array-statisics 和其他一些之后,如果您想避免重新发明轮子,我会推荐descriptive_statisticsgem。

gem install descriptive_statistics
$ irb
1.9.2 :001 > require 'descriptive_statistics'
 => true 
1.9.2 :002 > samples = [1, 2, 2.2, 2.3, 4, 5]
 => [1, 2, 2.2, 2.3, 4, 5] 
1.9.2p290 :003 > samples.sum
 => 16.5 
1.9.2 :004 > samples.mean
 => 2.75 
1.9.2 :005 > samples.variance
 => 1.7924999999999998 
1.9.2 :006 > samples.standard_deviation
 => 1.3388427838995882 

I can't speak to its statistical correctness, or your comfort with monkey-patching Enumerable; but it's easy to use and easy to contribute to.

我不能说它的统计正确性,或者你对猴子补丁 Enumerable 的安慰;但它易于使用且易于贡献。

回答by Dave Sag

The answer given above is elegant but has a slight error in it. Not being a stats head myself I sat up and read in detail a number of websites and found this one gave the most comprehensible explanation of how to derive a standard deviation. http://sonia.hubpages.com/hub/stddev

上面给出的答案很优雅,但有一个轻微的错误。我自己不是统计负责人,我坐起来详细阅读了一些网站,发现这个网站对如何推导出标准偏差给出了最容易理解的解释。http://sonia.hubpages.com/hub/stddev

The error in the answer above is in the sample_variancemethod.

上面答案中的错误在于sample_variance方法。

Here is my corrected version, along with a simple unit test that shows it works.

这是我更正的版本,以及一个表明它有效的简单单元测试。

in ./lib/enumerable/standard_deviation.rb

./lib/enumerable/standard_deviation.rb

#!usr/bin/ruby

module Enumerable

  def sum
    return self.inject(0){|accum, i| accum + i }
  end

  def mean
    return self.sum / self.length.to_f
  end

  def sample_variance
    m = self.mean
    sum = self.inject(0){|accum, i| accum + (i - m) ** 2 }
    return sum / (self.length - 1).to_f
  end

  def standard_deviation
    return Math.sqrt(self.sample_variance)
  end

end

in ./testusing numbers derived from a simple spreadsheet.

./test使用从简单的电子表格衍生号码。

Screen Snapshot of a Numbers spreadsheet with example data

带有示例数据的 Numbers 电子表格的屏幕快照

#!usr/bin/ruby

require 'enumerable/standard_deviation'

class StandardDeviationTest < Test::Unit::TestCase

  THE_NUMBERS = [1, 2, 2.2, 2.3, 4, 5]

  def test_sum
    expected = 16.5
    result = THE_NUMBERS.sum
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_mean
    expected = 2.75
    result = THE_NUMBERS.mean
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_sample_variance
    expected = 2.151
    result = THE_NUMBERS.sample_variance
    assert result == expected, "expected #{expected} but got #{result}"
  end

  def test_standard_deviation
    expected = 1.4666287874
    result = THE_NUMBERS.standard_deviation
    assert result.round(10) == expected, "expected #{expected} but got #{result}"
  end

end

回答by marcgg

I'm not a big fan of adding methods to Enumerablesince there could be unwanted side effects. It also gives methods really specific to an array of numbers to any class inheriting from Enumerable, which doesn't make sense in most cases.

我不喜欢添加方法,Enumerable因为可能会有不需要的副作用。它还为继承自 的任何类提供了真正特定于数字数组的方法Enumerable,这在大多数情况下没有意义。

While this is fine for tests, scripts or small apps, it's risky for larger applications, so here's an alternative based on @tolitius' answer which was already perfect. This is more for reference than anything else:

虽然这对于测试、脚本或小型应用程序来说很好,但对于较大的应用程序来说却是有风险的,所以这里有一个基于@tolitius 的答案的替代方案,它已经很完美了。这比其他任何东西都更适合参考:

module MyApp::Maths
  def self.sum(a)
    a.inject(0){ |accum, i| accum + i }
  end

  def self.mean(a)
    sum(a) / a.length.to_f
  end

  def self.sample_variance(a)
    m = mean(a)
    sum = a.inject(0){ |accum, i| accum + (i - m) ** 2 }
    sum / (a.length - 1).to_f
  end

  def self.standard_deviation(a)
    Math.sqrt(sample_variance(a))
  end
end

And then you use it as such:

然后你就这样使用它:

2.0.0p353 > MyApp::Maths.standard_deviation([1,2,3,4,5])
=> 1.5811388300841898

2.0.0p353 :007 > a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ]
 => [20, 23, 23, 24, 25, 22, 12, 21, 29]

2.0.0p353 :008 > MyApp::Maths.standard_deviation(a)
 => 4.594682917363407

2.0.0p353 :043 > MyApp::Maths.standard_deviation([1,2,2.2,2.3,4,5])
 => 1.466628787389638

The behavior is the same, but it avoids the overheads and risks of adding methods to Enumerable.

行为是相同的,但它避免了向Enumerable.

回答by Guss

The presented computation are not very efficient because they require several (at least two, but often three because you usually want to present average in addition to std-dev) passes through the array.

所呈现的计算不是很有效,因为它们需要多次(至少两次,但通常是三次,因为除了 std-dev 之外,您通常还想显示平均值)通过数组。

I know Ruby is not the place to look for efficiency, but here is my implementation that computes average and standard deviation with a single pass over the list values:

我知道 Ruby 不是寻求效率的地方,但这里是我的实现,它通过单次遍历列表值来计算平均值和标准偏差:

module Enumerable

  def avg_stddev
    return nil unless count > 0
    return [ first, 0 ] if count == 1
    sx = sx2 = 0
    each do |x|
      sx2 += x**2
      sx += x
    end
    [ 
      sx.to_f  / count,
      Math.sqrt( # http://wijmo.com/docs/spreadjs/STDEV.html
        (sx2 - sx**2.0/count)
        / 
        (count - 1)
      )
    ]
  end

end

回答by tothemario

As a simple function, given a list of numbers:

作为一个简单的函数,给定一个数字列表:

def standard_deviation(list)
  mean = list.inject(:+) / list.length.to_f
  var_sum = list.map{|n| (n-mean)**2}.inject(:+).to_f
  sample_variance = var_sum / (list.length - 1)
  Math.sqrt(sample_variance)
end

回答by Peter Kagey

If the records at hand are of type Integeror Rational, you may want to compute the variance using Rationalinstead of Floatto avoid errors introduced by rounding.

如果手头的记录是Integeror类型Rational,您可能希望使用Rational而不是计算方差Float以避免舍入引入的错误。

For example:

例如:

def variance(list)
  mean = list.reduce(:+)/list.length.to_r
  sum_of_squared_differences = list.map { |i| (i - mean)**2 }.reduce(:+)
  sum_of_squared_differences/list.length
end

(It would be prudent to add special-case handling for empty lists and other edge cases.)

(为空列表和其他边缘情况添加特殊情况处理是谨慎的。)

Then the square root can be defined as:

那么平方根可以定义为:

def std_dev(list)
  Math.sqrt(variance(list))
end

回答by Straff

In case people are using postgres ... it provides aggregate functions for stddev_pop and stddev_samp - postgresql aggregate functions

如果人们使用 postgres ... 它为 stddev_pop 和 stddev_samp 提供聚合函数 - postgresql 聚合函数

stddev (equiv of stddev_samp) available since at least postgres 7.1, since 8.2 both samp and pop are provided.

至少从 postgres 7.1 开始提供 stddev(相当于 stddev_samp),从 8.2 开始提供 samp 和 pop。

回答by Mads Boyd-Madsen

Or how about:

或者怎么样:

class Stats
    def initialize( a )
        @avg = a.count > 0 ? a.sum / a.count.to_f : 0.0
        @stdev = a.count > 0 ? ( a.reduce(0){ |sum, v| sum + (@avg - v) ** 2 } / a.count ) ** 0.5 : 0.0
    end
end