Python 在线性时间内获取列表中的第二大数字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16225677/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:03:11  来源:igfitidea点击:

Get the second largest number in a list in linear time

pythonperformance

提问by boisvert

I'm learning Python and the simple ways to handle lists is presented as an advantage. Sometimes it is, but look at this:

我正在学习 Python,并且处理列表的简单方法是一个优势。有时是这样,但看看这个:

>>> numbers = [20,67,3,2.6,7,74,2.8,90.8,52.8,4,3,2,5,7]
>>> numbers.remove(max(numbers))
>>> max(numbers)
74

A very easy, quick way of obtaining the second largest number from a list. Except that the easy list processing helps write a program that runs through the list twice over, to find the largest and then the 2nd largest. It's also destructive - I need two copies of the data if I wanted to keep the original. We need:

从列表中获取第二大数字的一种非常简单、快速的方法。除了简单的列表处理有助于编写一个程序,该程序运行两次列表,找到最大的,然后是第二大的。这也是破坏性的 - 如果我想保留原始数据,我需要两份数据。我们需要:

>>> numbers = [20,67,3,2.6,7,74,2.8,90.8,52.8,4,3,2,5,7]
>>> if numbers[0]>numbers[1]):
...    m, m2 = numbers[0], numbers[1]
... else:
...    m, m2 = numbers[1], numbers[0]
...
>>> for x in numbers[2:]:
...    if x>m2:
...       if x>m:
...          m2, m = m, x
...       else:
...          m2 = x
...
>>> m2
74

Which runs through the list just once, but isn't terse and clear like the previous solution.

它只遍历列表一次,但不像以前的解决方案那样简洁明了。

So: is there a way, in cases like this, to have both? The clarity of the first version, but the single run through of the second?

那么:在这种情况下,有没有办法同时拥有两者?第一个版本的清晰,但第二个版本的单一运行?

采纳答案by Thijs van Dien

Since @OscarLopez and I have different opinions on what the second largest means, I'll post the code according to my interpretation and in line with the first algorithm provided by the questioner.

由于@OscarLopez 和我对第二大的含义有不同的看法,我将根据我的解释并根据提问者提供的第一个算法发布代码。

def second_largest(numbers):
    count = 0
    m1 = m2 = float('-inf')
    for x in numbers:
        count += 1
        if x > m2:
            if x >= m1:
                m1, m2 = x, m1            
            else:
                m2 = x
    return m2 if count >= 2 else None

(Note: Negative infinity is used here instead of Nonesince Nonehas different sorting behavior in Python 2 and 3 – see Python - Find second smallest number; a check for the number of elements in numbersmakes sure that negative infinity won't be returned when the actual answer is undefined.)

(注意:这里使用负无穷大,而不是None因为None在 Python 2 和 3 中具有不同的排序行为 - 请参阅Python - 查找第二个最小的数字;检查 中的元素数量numbers确保在实际执行时不会返回负无穷大答案是不确定的。)

If the maximum occurs multiple times, it may be the second largest as well. Another thing about this approach is that it works correctly if there are less than two elements; then there is no second largest.

如果最大值出现多次,它也可能是第二大的。这种方法的另一件事是,如果元素少于两个,它可以正常工作;那么就没有第二大了。

Running the same tests:

运行相同的测试:

second_largest([20,67,3,2.6,7,74,2.8,90.8,52.8,4,3,2,5,7])
=> 74
second_largest([1,1,1,1,1,2])
=> 1
second_largest([2,2,2,2,2,1])
=> 2
second_largest([10,7,10])
=> 10
second_largest([1,1,1,1,1,1])
=> 1
second_largest([1])
=> None
second_largest([])
=> None

Update

更新

I restructured the conditionals to drastically improve performance; almost by a 100% in my testing on random numbers. The reason for this is that in the original version, the elifwas always evaluated in the likely event that the next number is not the largest in the list. In other words, for practically every number in the list, two comparisons were made, whereas one comparison mostly suffices – if the number is not larger than the second largest, it's not larger than the largest either.

我重新构造了条件语句以显着提高性能;在我对随机数的测试中几乎达到了 100%。这样做的原因是,在原始版本中,elif总是在下一个数字不是列表中最大的情况下进行评估。换句话说,对于列表中的几乎每个数字,都进行了两次比较,而一次比较基本上就足够了——如果该数字不大于第二大,则它也不大于最大。

回答by Volatility

You could always use sorted

你总是可以使用 sorted

>>> sorted(numbers)[-2]
74

回答by Jon Clements

You could use the heapqmodule:

您可以使用heapq模块:

>>> el = [20,67,3,2.6,7,74,2.8,90.8,52.8,4,3,2,5,7]
>>> import heapq
>>> heapq.nlargest(2, el)
[90.8, 74]

And go from there...

然后从那里走...

回答by óscar López

Try the solution below, it's O(n)and it will store and return the second greatest number in the secondvariable. UPDATE:I've adjusted the code to work with Python 3, because now arithmetic comparisons against Noneare invalid.

试试下面的解决方案,它O(n)会存储并返回second变量中的第二大数字。更新:我已经调整了代码以使用 Python 3,因为现在算术比较None是无效的。

Notice that if all elements in numbersare equal, or if numbersis empty or if it contains a single element, the variable secondwill end up with a value of None- this is correct, as in those cases there isn't a "second greatest" element.

请注意,如果 innumbers中的所有元素都相等,或者如果numbers为空,或者它包含单个元素,则该变量second的值将是None- 这是正确的,因为在这些情况下没有“第二大”元素。

Beware: this finds the "second maximum" value, if there's more than one value that is "first maximum", they will all be treated as the same maximum - in my definition, in a list such as this: [10, 7, 10]the correct answer is 7.

注意:这会找到“第二个最大值”值,如果有多个值是“第一个最大值”,它们都将被视为相同的最大值 - 在我的定义中,在这样的列表中:[10, 7, 10]正确答案是7.

def second_largest(numbers):
    minimum = float('-inf')
    first, second = minimum, minimum
    for n in numbers:
        if n > first:
            first, second = n, first
        elif first > n > second:
            second = n
    return second if second != minimum else None

Here are some tests:

以下是一些测试:

second_largest([20, 67, 3, 2.6, 7, 74, 2.8, 90.8, 52.8, 4, 3, 2, 5, 7])
=> 74
second_largest([1, 1, 1, 1, 1, 2])
=> 1
second_largest([2, 2, 2, 2, 2, 1])
=> 1
second_largest([10, 7, 10])
=> 7
second_largest( [1, 3, 10, 16])
=> 10
second_largest([1, 1, 1, 1, 1, 1])
=> None
second_largest([1])
=> None
second_largest([])
=> None

回答by kpie

there are some good answers here for type([]), in case someone needed the same thing on a type({}) here it is,

这里有一些很好的 type([]) 答案,以防有人在 type({}) 上需要同样的东西,这里是,

def secondLargest(D):
    def second_largest(L):  
        if(len(L)<2):
            raise Exception("Second_Of_One")
        KFL=None #KeyForLargest
        KFS=None #KeyForSecondLargest
        n = 0
        for k in L:
            if(KFL == None or k>=L[KFL]):
                KFS = KFL
                KFL = n
            elif(KFS == None or k>=L[KFS]):
                KFS = n
            n+=1
        return (KFS)
    KFL=None #KeyForLargest
    KFS=None #KeyForSecondLargest
    if(len(D)<2):
        raise Exception("Second_Of_One")
    if(type(D)!=type({})):
        if(type(D)==type([])):
            return(second_largest(D))
        else:
            raise Exception("TypeError")
    else:
        for k in D:
            if(KFL == None or D[k]>=D[KFL]):
                KFS = KFL               
                KFL = k
            elif(KFS == None or D[k] >= D[KFS]):
                KFS = k
    return(KFS)

a = {'one':1 , 'two': 2 , 'thirty':30}
b = [30,1,2] 
print(a[secondLargest(a)])
print(b[secondLargest(b)])

Just for fun I tried to make it user friendly xD

只是为了好玩,我试图让它变得用户友好 xD

回答by Edward Doolittle

The quickselect algorithm, O(n) cousin to quicksort, will do what you want. Quickselect has average performance O(n). Worst case performance is O(n^2) just like quicksort but that's rare, and modifications to quickselect reduce the worst case performance to O(n).

quickselect算法,O(n)的表弟快速排序,会做你想要什么。Quickselect 的平均性能为 O(n)。最坏情况的性能是 O(n^2),就像快速排序一样,但这种情况很少见,并且对 quickselect 的修改将最坏情况的性能降低到 O(n)。

The idea of quickselect is to use the same pivot, lower, higher idea of quicksort, but to then ignore the lower part and to further order just the higher part.

quickselect 的想法是使用与快速排序相同的枢轴、较低、较高的想法,但忽略较低的部分并进一步排序较高的部分。

回答by Brent D.

>>> l = [19, 1, 2, 3, 4, 20, 20]
>>> sorted(set(l))[-2]
19

回答by Vaisakh MC

n=input("Enter a list:")
n.sort()
l=len(n)
n.remove(n[l-1])
l=len(n)
print n[l-1]

回答by Bharatwaja

O(n): Time Complexity of a loop is considered as O(n) if the loop variables is incremented / decremented by a constant amount. For example following functions have O(n) time complexity.

O(n):如果循环变量以恒定量递增/递减,则循环的时间复杂度被视为 O(n)。例如以下函数的时间复杂度为 O(n)。

 // Here c is a positive integer constant   
   for (int i = 1; i <= n; i += c) {  
        // some O(1) expressions
   }

To find the second largest number i used the below method to find the largest number first and then search the list if thats in there or not

为了找到第二大数字,我使用下面的方法先找到最大的数字,然后搜索列表中是否有

x = [1,2,3]
A = list(map(int, x))
y = max(A)
k1 = list()
for values in range(len(A)):
if y !=A[values]:
    k.append(A[values])

z = max(k1)
print z

回答by ccy

This can be done in [N + log(N) - 2] time, which is slightly better than the loose upper bound of 2N (which can be thought of O(N) too).

这可以在 [N + log(N) - 2] 时间内完成,略好于 2N 的宽松上限(也可以认为是 O(N))。

The trick is to use binary recursive calls and "tennis tournament" algorithm. The winner (the largest number) will emerge after all the 'matches' (takes N-1 time), but if we record the 'players' of all the matches, and among them, group all the players that the winner has beaten, the second largest number will be the largest number in this group, i.e. the 'losers' group.

诀窍是使用二进制递归调用和“网球锦标赛”算法。获胜者(人数最多)将在所有“比赛”(需要 N-1 次)后出现,但如果我们记录所有比赛的“球员”,并将其中获胜者击败的所有球员分组,第二大数字将是该组中最大的数字,即“失败者”组。

The size of this 'losers' group is log(N), and again, we can revoke the binary recursive calls to find the largest among the losers, which will take [log(N) - 1] time. Actually, we can just linearly scan the losers group to get the answer too, the time budget is the same.

这个“失败者”组的大小是 log(N),同样,我们可以撤销二元递归调用以在失败者中找到最大的,这将花费 [log(N) - 1] 时间。其实我们也可以线性扫描失败者组来得到答案,时间预算是一样的。

Below is a sample python code:

下面是一个示例python代码:

def largest(L):
    global paris
    if len(L) == 1:
        return L[0]
    else:
        left = largest(L[:len(L)//2])
        right = largest(L[len(L)//2:])
        pairs.append((left, right))
        return max(left, right)

def second_largest(L):
    global pairs
    biggest = largest(L)
    second_L = [min(item) for item in pairs if biggest in item]

    return biggest, largest(second_L)  



if __name__ == "__main__":
    pairs = []
    # test array
    L = [2,-2,10,5,4,3,1,2,90,-98,53,45,23,56,432]    

    if len(L) == 0:
        first, second = None, None
    elif len(L) == 1:
        first, second = L[0], None
    else:
        first, second = second_largest(L)

    print('The largest number is: ' + str(first))
    print('The 2nd largest number is: ' + str(second))