在 Python 中查找列表的中位数

Question

提问by ChucksPlace

How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.

你如何在 Python 中找到列表的中位数？该列表可以是任何大小，并且不保证数字按任何特定顺序排列。

If the list contains an even number of elements, the function should return the average of the middle two.

如果列表包含偶数个元素，则函数应返回中间两个元素的平均值。

Here are some examples (sorted for display purposes):

以下是一些示例（为显示目的排序）：

median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2

Answer 1

回答by swolfe

The sorted() function is very helpful for this. Use the sorted function to order the list, then simply return the middle value (or average the two middle values if the list contains an even amount of elements).

sorted() 函数对此非常有帮助。使用 sorted 函数对列表进行排序，然后简单地返回中间值（如果列表包含偶数元素，则返回两个中间值的平均值）。

def median(lst):
    sortedLst = sorted(lst)
    lstLen = len(lst)
    index = (lstLen - 1) // 2

    if (lstLen % 2):
        return sortedLst[index]
    else:
        return (sortedLst[index] + sortedLst[index + 1])/2.0

Answer 2

回答by Padraic Cunningham

You can use the list.sortto avoid creating new lists with sortedand sort the lists in place.

您可以使用list.sort来避免创建新列表sorted并对列表进行排序。

Also you should not use listas a variable name as it shadows python's own list.

此外，您不应将其list用作变量名，因为它会影响 python 自己的list。

def median(l):
    half = len(l) // 2
    l.sort()
    if not len(l) % 2:
        return (l[half - 1] + l[half]) / 2.0
    return l[half]

Answer 3

回答by A.J. Uppal

(Works with python-2.x):

（适用于python-2.x）：

def median(lst):
    n = len(lst)
    s = sorted(lst)
    return (sum(s[n//2-1:n//2+1])/2.0, s[n//2])[n % 2] if n else None

>>> median([-5, -5, -3, -4, 0, -1])
-3.5

numpy.median():

numpy.median()：

>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0

For python-3.x, use statistics.median:

对于python-3.x，请使用statistics.median：

>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0

Answer 4

回答by Veedrac

Python 3.4 has statistics.median:

Python 3.4 有statistics.median：

Return the median (middle value) of numeric data.
When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values:
>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0

返回数值数据的中位数（中间值）。
当数据点数为奇数时，返回中间的数据点。当数据点数为偶数时，通过取两个中间值的平均值来插值中位数：
>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0

Usage:

用法：

import statistics

items = [6, 1, 8, 2, 3]

statistics.median(items)
#>>> 3

It's pretty careful with types, too:

它对类型也非常小心：

statistics.median(map(float, items))
#>>> 3.0

from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')

Answer 5

回答by Veedrac

You can try the quickselectalgorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n), although it can end up O(n2)on a bad day.

如果需要更快的平均情况运行时间，您可以尝试快速选择算法。Quickselect 具有平均（和最佳）案例性能O(n)，尽管它可能会O(n2)在糟糕的一天结束。

Here's an implementation with a randomly chosen pivot:

这是一个随机选择枢轴的实现：

import random

def select_nth(n, items):
    pivot = random.choice(items)

    lesser = [item for item in items if item < pivot]
    if len(lesser) > n:
        return select_nth(n, lesser)
    n -= len(lesser)

    numequal = items.count(pivot)
    if numequal > n:
        return pivot
    n -= numequal

    greater = [item for item in items if item > pivot]
    return select_nth(n, greater)

You can trivially turn this into a method to find medians:

您可以轻松地将其转换为查找中位数的方法：

def median(items):
    if len(items) % 2:
        return select_nth(len(items)//2, items)

    else:
        left  = select_nth((len(items)-1) // 2, items)
        right = select_nth((len(items)+1) // 2, items)

        return (left + right) / 2

This is very unoptimised, but it's not likely that even an optimised version will outperform Tim Sort (CPython's built-in sort) because that's really fast. I've tried before and I lost.

这是非常未优化的，但即使是优化版本也不太可能胜过 Tim Sort（CPython 的内置sort），因为它真的很快。我以前试过，我输了。

Answer 6

回答by Fred Beck

I defined a median function for a list of numbers as

我为数字列表定义了一个中值函数

def median(numbers):
    return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0

Answer 7

回答by Batuhan Ulug

Here's a cleaner solution:

这是一个更清洁的解决方案：

def median(lst):
    quotient, remainder = divmod(len(lst), 2)
    if remainder:
        return sorted(lst)[quotient]
    return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.

Note: Answer changed to incorporate suggestion in comments.

注意：答案已更改为在评论中包含建议。

Answer 8

回答by Юрий Мойдом Киев

median Function

中值函数

def median(midlist):
    midlist.sort()
    lens = len(midlist)
    if lens % 2 != 0: 
        midl = (lens / 2)
        res = midlist[midl]
    else:
        odd = (lens / 2) -1
        ev = (lens / 2) 
        res = float(midlist[odd] + midlist[ev]) / float(2)
    return res

Answer 9

回答by user5818263

I posted my solution at Python implementation of "median of medians" algorithm, which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.

我在Python implementation of "median of mediums" algorithm 上发布了我的解决方案，这比使用 sort() 快一点。我的解决方案每列使用 15 个数字，速度 ~5N，比每列使用 5 个数字的速度 ~10N 快。最佳速度是~4N，但我可能错了。

Per Tom's request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.

根据汤姆在他的评论中的要求，我在这里添加了我的代码，以供参考。我相信速度的关键部分是每列使用 15 个数字，而不是 5 个。

#!/bin/pypy
#
# TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random


items_per_column = 15


def find_i_th_smallest( A, i ):
    t = len(A)
    if(t <= items_per_column):
        # if A is a small list with less than items_per_column items, then:
        #
        # 1. do sort on A
        # 2. find i-th smallest item of A
        #
        return sorted(A)[i]
    else:
        # 1. partition A into columns of k items each. k is odd, say 5.
        # 2. find the median of every column
        # 3. put all medians in a new list, say, B
        #
        B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]

        # 4. find M, the median of B
        #
        M = find_i_th_smallest(B, (len(B) - 1)/2)


        # 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
        # 6. find which above set has A's i-th smallest, recursively.
        #
        P1 = [ j for j in A if j < M ]
        if(i < len(P1)):
            return find_i_th_smallest( P1, i)
        P3 = [ j for j in A if j > M ]
        L3 = len(P3)
        if(i < (t - L3)):
            return M
        return find_i_th_smallest( P3, i - (t - L3))


# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])


# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]


# Show the original list
#
# print L


# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]


# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)

Answer 10

回答by warvariuc

def median(array):
    """Calculate median of the given list.
    """
    # TODO: use statistics.median in Python 3
    array = sorted(array)
    half, odd = divmod(len(array), 2)
    if odd:
        return array[half]
    return (array[half - 1] + array[half]) / 2.0

在 Python 中查找列表的中位数

提问by ChucksPlace

回答by swolfe

回答by Padraic Cunningham

回答by A.J. Uppal

回答by Veedrac

回答by Veedrac

回答by Fred Beck

回答by Batuhan Ulug

回答by Юрий Мойдом Киев

回答by user5818263

回答by warvariuc

相关推荐

最近更新

标签

在 Python 中查找列表的中位数

提问by ChucksPlace

回答by swolfe

回答by Padraic Cunningham

回答by A.J. Uppal

回答by Veedrac

回答by Veedrac

回答by Fred Beck

回答by Batuhan Ulug

回答by Юрий Мойдом Киев

回答by user5818263

回答by warvariuc

相关推荐

Python 如何将元素附加到 numpy 数组

Python 如何在原子编辑器中隐藏 *pyc 文件

Python ascii 编解码器无法解码字节 0xe9

Python 如何用numpy数组中的值替换一列？

相关推荐

最近更新

标签