在 Python 中查找列表的中位数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24101524/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Finding median of list in Python
提问by ChucksPlace
How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.
你如何在 Python 中找到列表的中位数?该列表可以是任何大小,并且不保证数字按任何特定顺序排列。
If the list contains an even number of elements, the function should return the average of the middle two.
如果列表包含偶数个元素,则函数应返回中间两个元素的平均值。
Here are some examples (sorted for display purposes):
以下是一些示例(为显示目的排序):
median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2
回答by swolfe
The sorted() function is very helpful for this. Use the sorted function to order the list, then simply return the middle value (or average the two middle values if the list contains an even amount of elements).
sorted() 函数对此非常有帮助。使用 sorted 函数对列表进行排序,然后简单地返回中间值(如果列表包含偶数元素,则返回两个中间值的平均值)。
def median(lst):
sortedLst = sorted(lst)
lstLen = len(lst)
index = (lstLen - 1) // 2
if (lstLen % 2):
return sortedLst[index]
else:
return (sortedLst[index] + sortedLst[index + 1])/2.0
回答by Padraic Cunningham
You can use the list.sort
to avoid creating new lists with sorted
and sort the lists in place.
您可以使用list.sort
来避免创建新列表sorted
并对列表进行排序。
Also you should not use list
as a variable name as it shadows python's own list.
此外,您不应将其list
用作变量名,因为它会影响 python 自己的list。
def median(l):
half = len(l) // 2
l.sort()
if not len(l) % 2:
return (l[half - 1] + l[half]) / 2.0
return l[half]
回答by A.J. Uppal
(Works with python-2.x):
(适用于python-2.x):
def median(lst):
n = len(lst)
s = sorted(lst)
return (sum(s[n//2-1:n//2+1])/2.0, s[n//2])[n % 2] if n else None
>>> median([-5, -5, -3, -4, 0, -1])
-3.5
>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0
For python-3.x, use statistics.median
:
对于python-3.x,请使用statistics.median
:
>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0
回答by Veedrac
Python 3.4 has statistics.median
:
Python 3.4 有statistics.median
:
Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values:
>>> median([1, 3, 5]) 3 >>> median([1, 3, 5, 7]) 4.0
返回数值数据的中位数(中间值)。
当数据点数为奇数时,返回中间的数据点。当数据点数为偶数时,通过取两个中间值的平均值来插值中位数:
>>> median([1, 3, 5]) 3 >>> median([1, 3, 5, 7]) 4.0
Usage:
用法:
import statistics
items = [6, 1, 8, 2, 3]
statistics.median(items)
#>>> 3
It's pretty careful with types, too:
它对类型也非常小心:
statistics.median(map(float, items))
#>>> 3.0
from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')
回答by Veedrac
You can try the quickselectalgorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n)
, although it can end up O(n2)
on a bad day.
如果需要更快的平均情况运行时间,您可以尝试快速选择算法。Quickselect 具有平均(和最佳)案例性能O(n)
,尽管它可能会O(n2)
在糟糕的一天结束。
Here's an implementation with a randomly chosen pivot:
这是一个随机选择枢轴的实现:
import random
def select_nth(n, items):
pivot = random.choice(items)
lesser = [item for item in items if item < pivot]
if len(lesser) > n:
return select_nth(n, lesser)
n -= len(lesser)
numequal = items.count(pivot)
if numequal > n:
return pivot
n -= numequal
greater = [item for item in items if item > pivot]
return select_nth(n, greater)
You can trivially turn this into a method to find medians:
您可以轻松地将其转换为查找中位数的方法:
def median(items):
if len(items) % 2:
return select_nth(len(items)//2, items)
else:
left = select_nth((len(items)-1) // 2, items)
right = select_nth((len(items)+1) // 2, items)
return (left + right) / 2
This is very unoptimised, but it's not likely that even an optimised version will outperform Tim Sort (CPython's built-in sort
) because that's really fast. I've tried before and I lost.
这是非常未优化的,但即使是优化版本也不太可能胜过 Tim Sort(CPython 的内置sort
),因为它真的很快。我以前试过,我输了。
回答by Fred Beck
I defined a median function for a list of numbers as
我为数字列表定义了一个中值函数
def median(numbers):
return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0
回答by Batuhan Ulug
Here's a cleaner solution:
这是一个更清洁的解决方案:
def median(lst):
quotient, remainder = divmod(len(lst), 2)
if remainder:
return sorted(lst)[quotient]
return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.
Note: Answer changed to incorporate suggestion in comments.
注意:答案已更改为在评论中包含建议。
回答by Юрий Мойдом Киев
median Function
中值函数
def median(midlist):
midlist.sort()
lens = len(midlist)
if lens % 2 != 0:
midl = (lens / 2)
res = midlist[midl]
else:
odd = (lens / 2) -1
ev = (lens / 2)
res = float(midlist[odd] + midlist[ev]) / float(2)
return res
回答by user5818263
I posted my solution at Python implementation of "median of medians" algorithm, which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.
我在Python implementation of "median of mediums" algorithm 上发布了我的解决方案,这比使用 sort() 快一点。我的解决方案每列使用 15 个数字,速度 ~5N,比每列使用 5 个数字的速度 ~10N 快。最佳速度是~4N,但我可能错了。
Per Tom's request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.
根据汤姆在他的评论中的要求,我在这里添加了我的代码,以供参考。我相信速度的关键部分是每列使用 15 个数字,而不是 5 个。
#!/bin/pypy
#
# TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random
items_per_column = 15
def find_i_th_smallest( A, i ):
t = len(A)
if(t <= items_per_column):
# if A is a small list with less than items_per_column items, then:
#
# 1. do sort on A
# 2. find i-th smallest item of A
#
return sorted(A)[i]
else:
# 1. partition A into columns of k items each. k is odd, say 5.
# 2. find the median of every column
# 3. put all medians in a new list, say, B
#
B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]
# 4. find M, the median of B
#
M = find_i_th_smallest(B, (len(B) - 1)/2)
# 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
# 6. find which above set has A's i-th smallest, recursively.
#
P1 = [ j for j in A if j < M ]
if(i < len(P1)):
return find_i_th_smallest( P1, i)
P3 = [ j for j in A if j > M ]
L3 = len(P3)
if(i < (t - L3)):
return M
return find_i_th_smallest( P3, i - (t - L3))
# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])
# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]
# Show the original list
#
# print L
# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]
# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)
回答by warvariuc
def median(array):
"""Calculate median of the given list.
"""
# TODO: use statistics.median in Python 3
array = sorted(array)
half, odd = divmod(len(array), 2)
if odd:
return array[half]
return (array[half - 1] + array[half]) / 2.0