如何计算python中列表的方差？

Question

提问by minks

If I have a list like this:

如果我有这样的列表：

results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
          0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]

I want to calculate the variance of this list in Python which is the average of the squared differences from the mean.

我想用 Python 计算这个列表的方差，它是均值的平方差的平均值。

How can I go about this? Accessing the elements in the list to do the computations is confusing me for getting the square differences.

我该怎么办？访问列表中的元素来进行计算让我无法获得平方差。

Answer 1

采纳答案by Cleb

You can use numpy's built-in function var:

您可以使用 numpy 的内置函数var：

import numpy as np

results = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
          0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]

print(np.var(results))

This gives you 28.822364260579157

这给你 28.822364260579157

If - for whatever reason - you cannot use numpyand/or you don't want to use a built-in function for it, you can also calculate it "by hand" using e.g. a list comprehension:

如果 - 无论出于何种原因 - 您不能使用numpy和/或您不想为其使用内置函数，您也可以使用例如列表理解来“手动”计算它：

# calculate mean
m = sum(results) / len(results)

# calculate variance using a list comprehension
var_res = sum((xi - m) ** 2 for xi in results) / len(results)

which gives you the identical result.

这会给你相同的结果。

If you are interested in the standard deviation, you can use numpy.std:

如果您对标准偏差感兴趣，可以使用numpy.std：

print(np.std(results))
5.36864640860051

@Serge Ballesta explained very wellthe difference between variance nand n-1. In numpy you can easily set this parameter using the option ddof; its default is 0, so for the n-1case you can simply do:

@Serge Ballesta 很好地解释了方差n和n-1. 在 numpy 中，您可以使用选项轻松设置此参数ddof；它的默认值为0，因此对于这种n-1情况，您可以简单地执行以下操作：

np.var(results, ddof=1)

The "by hand" solution is given in @Serge Ballesta's answer.

“手工”解决方案在@Serge Ballesta's answer 中给出。

Both approaches yield 32.024849178421285.

两种方法都产生了32.024849178421285。

You can set the parameter also for std:

您也可以为以下设置参数std：

np.std(results, ddof=1)
5.659050201086865

Answer 2

回答by Serge Ballesta

Well, there are two ways for defining the variance. You have the variance nthat you use when you have a full set, and the variance n-1that you use when you have a sample.

好吧，有两种方法可以定义方差。你有差异ñ当你有一个全套您使用，方差N-1 ，当你有一个样品大家使用。

The difference between the 2 is whether the value m = sum(xi) / nis the real average or whether it is just an approximation of what the average should be.

2 之间的区别在于该值m = sum(xi) / n是实际平均值还是只是平均值的近似值。

Example1 : you want to know the average height of the students in a class and its variance : ok, the value m = sum(xi) / nis the real average, and the formulas given by Cleb are ok (variance n).

Example1：你想知道一个班级学生的平均身高及其方差：ok，这个值m = sum(xi) / n是真实的平均值，Cleb给出的公式是可以的（方差n）。

Example2 : you want to know the average hour at which a bus passes at the bus stop and its variance. You note the hour for a month, and get 30 values. Here the value m = sum(xi) / nis only an approximation of the real average, and that approximation will be more accurate with more values. In that case the best approximation for the actual variance is the variance n-1

示例 2：您想知道公交车在公交车站经过的平均小时数及其方差。您记下一个月的小时数，并获得 30 个值。这里的值m = sum(xi) / n只是实际平均值的近似值，并且该近似值会随着值的增加而更加准确。在这种情况下，实际方差的最佳近似值是方差n-1

varRes = sum([(xi - m)**2 for xi in results]) / (len(results) -1)

Ok, it has nothing to do with Python, but it does have an impact on statistical analysis, and the question is tagged statisticsand variance

好吧，和Python没有关系，但是确实对统计分析有影响，问题是标记了统计和方差

Note: ordinarily, statistical libraries like numpy use the variance nfor what they call varor variance, and the variance n-1for the function that gives the standard deviation.

注意：通常，像 numpy 这样的统计库使用方差n表示它们所称的varor variance，而方差n-1表示给出标准偏差的函数。

Answer 3

回答by roadrunner66

Numpy is indeed the most elegant and fast way to do it.

Numpy 确实是最优雅、最快速的方法。

I think the actual question was about how to access the individual elements of a list to do such a calculation yourself, so below an example:

我认为实际的问题是关于如何访问列表的各个元素来自己进行这样的计算，下面是一个例子：

results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439,
      0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]

import numpy as np
print 'numpy variance: ', np.var(results)


# without numpy by hand  

# there are two ways of calculating the variance 
#   - 1. direct as central 2nd order moment (https://en.wikipedia.org/wiki/Moment_(mathematics))divided by the length of the vector
#   - 2. "mean of square minus square of mean" (see https://en.wikipedia.org/wiki/Variance)

# calculate mean
n= len(results)
sum=0
for i in range(n):
    sum = sum+ results[i]


mean=sum/n
print 'mean: ', mean

#  calculate the central moment
sum2=0
for i in range(n):
    sum2=sum2+ (results[i]-mean)**2

myvar1=sum2/n
print "my variance1: ", myvar1

# calculate the mean of square minus square of mean
sum3=0
for i in range(n):
    sum3=sum3+ results[i]**2

myvar2 = sum3/n - mean**2
print "my variance2: ", myvar2

gives you:

给你：

numpy variance:  28.8223642606
mean:  -3.731599805
my variance1:  28.8223642606
my variance2:  28.8223642606

Answer 4

回答by Xavier Guihot

Starting Python 3.4, the standard library comes with the variancefunction (sample varianceor variance n-1) as part of the statisticsmodule:

开始Python 3.4，标准库附带variance函数（样本方差或方差 n-1）作为statistics模块的一部分：

from statistics import variance
# data = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
variance(data)
# 32.024849178421285

The population variance(or variance n) can be obtained using the pvariancefunction:

所述population方差（或方差ñ可以使用获得）pvariance功能：

from statistics import pvariance
# data = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097]
pvariance(data)
# 28.822364260579157

Also note that if you already know the mean of your list, the varianceand pvariancefunctions take a second argument (respectively xbarand mu) in order to spare recomputing the mean of the sample (which is part of the variance computation).

另请注意，如果您已经知道列表的均值，则variance和pvariance函数采用第二个参数（分别为xbar和mu）以节省重新计算样本的均值（这是方差计算的一部分）。

Answer 5

回答by Mark Lakata

The correct answer is to use one of the packages like NumPy, but if you want to roll your own, and you want to do incrementally, there is a good algorithm that has higher accuracy. See this link https://www.johndcook.com/blog/standard_deviation/

正确的答案是使用 NumPy 之类的包之一，但是如果您想自己推出，并且想要增量进行，则有一种具有更高准确性的好算法。请参阅此链接https://www.johndcook.com/blog/standard_deviation/

I ported my perl implementation to Python. Please point out issues in the comments.

我将我的 perl 实现移植到 Python。请在评论中指出问题。

Mklast = 0
Mk = 0
Sk = 0
k  = 0 

for xi in results:
  k = k +1
  Mk = Mklast + (xi - Mklast) / k
  Sk = Sk + (xi - Mklast) * ( xi - Mk)
  Mklast = Mk

var = Sk / (k -1)
print var

Answer is

答案是

>>> print var
32.0248491784

Answer 6

回答by sim

import numpy as np
def get_variance(xs):
    mean = np.mean(xs)
    summed = 0
    for x in xs:
        summed += (x - mean)**2
    return summed / (len(xs))
print(get_variance([1,2,3,4,5]))

out 2.0

出 2.0

a = [1,2,3,4,5]
variance = np.var(a, ddof=1)
print(variance)

Answer 7

回答by Shushiro

Without imports, I would use the following python3 script:

如果没有导入，我将使用以下 python3 脚本：

#!/usr/bin/env python3

def createData():
    data1=[12,54,60,3,15,6,36]
    data2=[1,2,3,4,5]
    data3=[100,30000,1567,3467,20000,23457,400,1,15]

    dataset=[]
    dataset.append(data1)
    dataset.append(data2)
    dataset.append(data3)

    return dataset

def calculateMean(data):
    means=[]
    # one list of the nested list
    for oneDataset in data:
        sum=0
        mean=0
        # one datapoint in one inner list
        for number in oneDataset:
            # summing up
            sum+=number
        # mean for one inner list
        mean=sum/len(oneDataset)
        # adding a tuples of the original data and their mean to
        # a list of tuples
        item=(oneDataset, mean)
        means.append(item)

    return means

# to do: substract mean from each element and square the result
# sum up the square results and divide by number of elements
def calculateVariance(meanData):
    variances=[]
    # meanData is the list of tuples
    # pair is one tuple
    for pair in meanData:
        # pair[0] is the original data
        interResult=0
        squareSum=0
        for element in pair[0]:
            interResult=(element-pair[1])**2
            squareSum+=interResult
        variance=squareSum/len(pair[0])
        variances.append((pair[0], pair[1], variance))

    return variances





def main():
    my_data=createData()
    my_means=calculateMean(my_data)
    my_variances=calculateVariance(my_means)
    print(my_variances)

if __name__ == "__main__":
    main()

here you get a print of the original data, their mean and the variance. I know this approach covers a list of several datasets, yet I think you can adapt it quickly for your purpose ;)

在这里，您可以打印原始数据、它们的均值和方差。我知道这种方法涵盖了几个数据集的列表，但我认为您可以根据自己的目的快速调整它；)

如何计算python中列表的方差？

提问by minks

采纳答案by Cleb

回答by Serge Ballesta

回答by roadrunner66

回答by Xavier Guihot

回答by Mark Lakata

回答by sim

回答by Shushiro

相关推荐

最近更新

标签

如何计算python中列表的方差？

提问by minks

采纳答案by Cleb

回答by Serge Ballesta

回答by roadrunner66

回答by Xavier Guihot

回答by Mark Lakata

回答by sim

回答by Shushiro

相关推荐

如何在 Python 中检查对象的类型？

Python Pandas 将一列列表拆分为多列

检查python中命令行参数的数量

Python Traceback（最近一次调用最后一次）

相关推荐

最近更新

标签