使用 Python 进行归并排序

Question

提问by Hans

I couldn't find any working Python 3.3 mergesort algorithm codes, so I made one myself. Is there any way to speed it up? It sorts 20,000 numbers in about 0.3-0.5 seconds

我找不到任何可用的 Python 3.3 归并排序算法代码，所以我自己做了一个。有什么办法可以加快速度吗？它在大约 0.3-0.5 秒内对 20,000 个数字进行排序

def msort(x):
    result = []
    if len(x) < 2:
        return x
    mid = int(len(x)/2)
    y = msort(x[:mid])
    z = msort(x[mid:])
    while (len(y) > 0) or (len(z) > 0):
        if len(y) > 0 and len(z) > 0:
            if y[0] > z[0]:
                result.append(z[0])
                z.pop(0)
            else:
                result.append(y[0])
                y.pop(0)
        elif len(z) > 0:
            for i in z:
                result.append(i)
                z.pop(0)
        else:
            for i in y:
                result.append(i)
                y.pop(0)
    return result

Answer 1

采纳答案by firefrorefiddle

You can initialise the whole result list in the top level call to mergesort:

您可以在对合并排序的顶级调用中初始化整个结果列表：

result = [0]*len(x)   # replace 0 with a suitable default element if necessary. 
                      # or just copy x (result = x[:])

Then for the recursive calls you can use a helper function to which you pass not sublists, but indices into x. And the bottom level calls read their values from xand write into resultdirectly.

然后对于递归调用，您可以使用一个辅助函数，您不是将子列表传递给它，而是将索引传递给x. 底层调用直接读取x和写入它们的值result。

That way you can avoid all that poping and appending which should improve performance.

这样你就可以避免所有应该提高性能的poping 和appending。

Answer 2

回答by Tamás

Loops like this can probably be speeded up:

像这样的循环可能可以加速：

for i in z:
    result.append(i)
    z.pop(0)

Instead, simply do this:

相反，只需执行以下操作：

result.extend(z)

Note that there is no need to clean the contents of zbecause you won't use it anyway.

请注意，无需清理的内容，z因为无论如何您都不会使用它。

Answer 3

回答by anumi

The first improvement would be to simplify the three cases in the main loop: Rather than iterating while some of the sequence has elements, iterate while bothsequences have elements. When leaving the loop, one of them will be empty, we don't know which, but we don't care: We append them at the end of the result.

第一个改进是简化主循环中的三种情况：不是在某些序列有元素时进行迭代，而是在两个序列都有元素时进行迭代。当离开循环时，其中一个将为空，我们不知道哪个，但我们不在乎：我们将它们附加在结果的末尾。

def msort2(x):
    if len(x) < 2:
        return x
    result = []          # moved!
    mid = int(len(x) / 2)
    y = msort2(x[:mid])
    z = msort2(x[mid:])
    while (len(y) > 0) and (len(z) > 0):
        if y[0] > z[0]:
            result.append(z[0])
            z.pop(0)
        else:
            result.append(y[0])
            y.pop(0)
    result += y
    result += z
    return result

The second optimization is to avoid popping the elements. Rather, have two indices:

第二个优化是避免popping元素。相反，有两个索引：

def msort3(x):
    if len(x) < 2:
        return x
    result = []
    mid = int(len(x) / 2)
    y = msort3(x[:mid])
    z = msort3(x[mid:])
    i = 0
    j = 0
    while i < len(y) and j < len(z):
        if y[i] > z[j]:
            result.append(z[j])
            j += 1
        else:
            result.append(y[i])
            i += 1
    result += y[i:]
    result += z[j:]
    return result

A final improvement consists in using a non recursive algorithm to sort short sequences. In this case I use the built-in sortedfunction and use it when the size of the input is less than 20:

最后的改进在于使用非递归算法对短序列进行排序。在这种情况下，我使用内置sorted函数并在输入的大小小于 20 时使用它：

def msort4(x):
    if len(x) < 20:
        return sorted(x)
    result = []
    mid = int(len(x) / 2)
    y = msort4(x[:mid])
    z = msort4(x[mid:])
    i = 0
    j = 0
    while i < len(y) and j < len(z):
        if y[i] > z[j]:
            result.append(z[j])
            j += 1
        else:
            result.append(y[i])
            i += 1
    result += y[i:]
    result += z[j:]
    return result

My measurements to sort a random list of 100000 integers are 2.46 seconds for the original version, 2.33 for msort2, 0.60 for msort3 and 0.40 for msort4. For reference, sorting all the list with sortedtakes 0.03 seconds.

对于原始版本，我对 100000 个整数的随机列表进行排序的测量结果为 2.46 秒，msort2 为 2.33，msort3 为 0.60，msort4 为 0.40。作为参考，对所有列表进行排序sorted需要 0.03 秒。

Answer 4

回答by MikeRand

A longer one that counts inversions and adheres to the sortedinterface. It's trivial to modify this to make it a method of an object that sorts in place.

一个较长的计数反转并坚持sorted界面。修改它以使其成为就地排序的对象的方法是微不足道的。

import operator

class MergeSorted:

    def __init__(self):
        self.inversions = 0

    def __call__(self, l, key=None, reverse=False):

        self.inversions = 0

        if key is None:
            self.key = lambda x: x
        else:
            self.key = key

        if reverse:
            self.compare = operator.gt
        else:
            self.compare = operator.lt

        dest = list(l)
        working = [0] * len(l)
        self.inversions = self._merge_sort(dest, working, 0, len(dest))
        return dest

    def _merge_sort(self, dest, working, low, high):
        if low < high - 1:
            mid = (low + high) // 2
            x = self._merge_sort(dest, working, low, mid)
            y = self._merge_sort(dest, working, mid, high)
            z = self._merge(dest, working, low, mid, high)
            return (x + y + z)
        else:
            return 0

    def _merge(self, dest, working, low, mid, high):
        i = 0
        j = 0
        inversions = 0

        while (low + i < mid) and (mid + j < high):
            if self.compare(self.key(dest[low + i]), self.key(dest[mid + j])):
                working[low + i + j] = dest[low + i]
                i += 1
            else:
                working[low + i + j] = dest[mid + j]
                inversions += (mid - (low + i))
                j += 1

        while low + i < mid:
            working[low + i + j] = dest[low + i]
            i += 1

        while mid + j < high:
            working[low + i + j] = dest[mid + j]
            j += 1

        for k in range(low, high):
            dest[k] = working[k]

        return inversions


msorted = MergeSorted()

Uses

用途

>>> l = [5, 2, 3, 1, 4]
>>> s = msorted(l)
>>> s
[1, 2, 3, 4, 5]
>>> msorted.inversions
6

>>> l = ['e', 'b', 'c', 'a', 'd']
>>> d = {'a': 10,
...      'b': 4,
...      'c': 2,
...      'd': 5,
...      'e': 9}
>>> key = lambda x: d[x]
>>> s = msorted(l, key=key)
>>> s
['c', 'b', 'd', 'e', 'a']
>>> msorted.inversions
5

>>> l = [5, 2, 3, 1, 4]
>>> s = msorted(l, reverse=True)
>>> s
[5, 4, 3, 2, 1]
>>> msorted.inversions
4

>>> l = ['e', 'b', 'c', 'a', 'd']
>>> d = {'a': 10,
...      'b': 4,
...      'c': 2,
...      'd': 5,
...      'e': 9}
>>> key = lambda x: d[x]
>>> s = msorted(l, key=key, reverse=True)
>>> s
['a', 'e', 'd', 'b', 'c']
>>> msorted.inversions
5

Answer 5

回答by Moj

here is another solution

这是另一个解决方案

class MergeSort(object):
    def _merge(self,left, right):
        nl = len(left)
        nr = len(right)
        result = [0]*(nl+nr)
        i=0
        j=0
        for k in range(len(result)):
            if nl>i and nr>j:
                if left[i] <= right[j]:
                    result[k]=left[i]
                    i+=1
                else:
                    result[k]=right[j]
                    j+=1
            elif nl==i:
                result[k] = right[j]
                j+=1
            else: #nr>j:
                result[k] = left[i]
                i+=1
        return result

    def sort(self,arr):
        n = len(arr)
        if n<=1:
            return arr 
        left = self.sort(arr[:n/2])
        right = self.sort(arr[n/2:] )
        return self._merge(left, right)
def main():
    import random
    a= range(100000)
    random.shuffle(a)
    mr_clss = MergeSort()
    result = mr_clss.sort(a)
    #print result

if __name__ == '__main__':
    main()

and here is run time for list with 100000 elements:

这是具有 100000 个元素的列表的运行时间：

real    0m1.073s
user    0m1.053s
sys         0m0.017s

Answer 6

回答by zack

def merge(l1, l2, out=[]):
    if l1==[]: return out+l2
    if l2==[]: return out+l1
    if l1[0]<l2[0]: return merge(l1[1:], l2, out+l1[0:1])
    return merge(l1, l2[1:], out+l2[0:1])
def merge_sort(l): return (lambda h: l if h<1 else merge(merge_sort(l[:h]), merge_sort(l[h:])))(len(l)/2)
print(merge_sort([1,4,6,3,2,5,78,4,2,1,4,6,8]))

Answer 7

回答by sarath joseph

def merge_sort(x):

    if len(x) < 2:return x

    result,mid = [],int(len(x)/2)

    y = merge_sort(x[:mid])
    z = merge_sort(x[mid:])

    while (len(y) > 0) and (len(z) > 0):
            if y[0] > z[0]:result.append(z.pop(0))   
            else:result.append(y.pop(0))

    result.extend(y+z)
    return result

Answer 8

回答by Boubakr

Take my implementation

拿我的实现

def merge_sort(sequence):
    """
    Sequence of numbers is taken as input, and is split into two halves, following which they are recursively sorted.
    """
    if len(sequence) < 2:
        return sequence

    mid = len(sequence) // 2     # note: 7//2 = 3, whereas 7/2 = 3.5

    left_sequence = merge_sort(sequence[:mid])
    right_sequence = merge_sort(sequence[mid:])

    return merge(left_sequence, right_sequence)

def merge(left, right):
    """
    Traverse both sorted sub-arrays (left and right), and populate the result array
    """
    result = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    result += left[i:]
    result += right[j:]

    return result

# Print the sorted list.
print(merge_sort([5, 2, 6, 8, 5, 8, 1]))

Answer 9

回答by mojians

Try this recursive version

试试这个递归版本

def mergeList(l1,l2):
    l3=[]
    Tlen=len(l1)+len(l2)
    inf= float("inf")
    for i in range(Tlen):
        print   "l1= ",l1[0]," l2= ",l2[0]
        if l1[0]<=l2[0]:
            l3.append(l1[0])
            del l1[0]
            l1.append(inf)
        else:
            l3.append(l2[0])
            del l2[0]
            l2.append(inf)
    return l3

def main():
    l1=[2,10,7,6,8]
    print mergeSort(breaklist(l1))

def breaklist(rawlist):
    newlist=[]
    for atom in rawlist:
        print atom
        list_atom=[atom]
        newlist.append(list_atom)
    return newlist

def mergeSort(inputList):
    listlen=len(inputList)
    if listlen ==1:
        return inputList
    else:
        newlist=[]
        if listlen % 2==0:
            for i in range(listlen/2):
                newlist.append(mergeList(inputList[2*i],inputList[2*i+1]))
        else:
            for i in range((listlen+1)/2):
                if 2*i+1<listlen:
                    newlist.append(mergeList(inputList[2*i],inputList[2*i+1]))
                else:
                    newlist.append(inputList[2*i])
        return  mergeSort(newlist)

if __name__ == '__main__':
    main()

Answer 10

回答by B. M.

As already said, l.pop(0)is a O(len(l)) operation and must be avoided, the above msort function is O(n**2). If efficiency matter, indexing is better but have cost too. The for x in lis faster but not easy to implement for mergesort : itercan be used instead here. Finally, checking i < len(l)is made twice because tested again when accessing the element : the exception mechanism (try except) is better and give a last improvement of 30% .

如前所述，l.pop(0)是一个 O(len(l)) 操作并且必须避免，上面的 msort 函数是 O(n**2)。如果效率很重要，索引会更好，但也有成本。的for x in l速度更快，但是不容易实现的归并：iter可代替这里使用。最后，检查i < len(l)两次，因为在访问元素时再次测试：异常机制（try except）更好，最后提高了 30% 。

def msort(l):
    if len(l)>1:
        t=len(l)//2
        it1=iter(msort(l[:t]));x1=next(it1)
        it2=iter(msort(l[t:]));x2=next(it2)
        l=[]
        try:
            while True:
                if x1<=x2: l.append(x1);x1=next(it1)
                else     : l.append(x2);x2=next(it2)
        except:
            if x1<=x2: l.append(x2);l.extend(it2)
            else:      l.append(x1);l.extend(it1)
    return l

使用 Python 进行归并排序

提问by Hans

采纳答案by firefrorefiddle

回答by Tamás

回答by anumi

回答by MikeRand

回答by Moj

回答by zack

回答by sarath joseph

回答by Boubakr

回答by mojians

回答by B. M.

相关推荐

最近更新

标签

使用 Python 进行归并排序

提问by Hans

采纳答案by firefrorefiddle

回答by Tamás

回答by anumi

回答by MikeRand

回答by Moj

回答by zack

回答by sarath joseph

回答by Boubakr

回答by mojians

回答by B. M.

相关推荐

Python 如何在 matplotlib 1.4 中使用 viridis

Python：如何在运行 os.system 后获取标准输出？

在Python中以YYYY-MM-DD获取今天的日期？

使用变量从python执行shell脚本

相关推荐

最近更新

标签