Java 多线程快速排序或归并排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2210185/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 04:46:47  来源:igfitidea点击:

Multithreaded quicksort or mergesort

javamultithreadingsortingquicksortmergesort

提问by SyntaxT3rr0r

How can I implement a concurrent quicksort or mergesort algorithm for Java?

如何为 Java 实现并发快速排序或归并排序算法?

We've had issues on a 16-(virtual)-cores Mac where only one core (!) was working using the default Java sorting algo and it was, well, not good to see that very fine machine be completely underused. So we wrote our own (I wrote it) and we did indeed gain good speedups (I wrote a multithreaded quicksort and due to its partitioning nature it parallelize very well but I could have written a mergesort too)... But my implementation only scales up to 4 threads, it's proprietary code, and I'd rather use one coming from a reputable source instead of using my re-invented wheel.

我们在 16 核(虚拟)核 Mac 上遇到了问题,其中只有一个核 (!) 使用默认的 Java 排序算法工作,而且看到这台非常好的机器完全没有得到充分利用是不好的。所以我们写了我们自己的(我写的)并且我们确实获得了很好的加速(我写了一个多线程快速排序,由于它的分区性质,它可以很好地并行化,但我也可以写一个归并排序)......但我的实现只能扩展最多 4 个线程,它是专有代码,我宁愿使用来自信誉良好的来源的线程,而不是使用我重新发明的轮子。

The only one I found on the Web is an example of how notto write a multi-threaded quicksort in Java, it is busy-looping (which is really terrible) using a:

我在网上找到的唯一一个是如何不在Java 中编写多线程快速排序的示例,它使用以下语句进行忙循环(这真的很糟糕):

while (helpRequested) { }

http://broadcast.oreilly.com/2009/06/may-column-multithreaded-algor.html

http://broadcast.oreilly.com/2009/06/may-column-multithreaded-algor.html

So in addition to losing one thread for no reason it's making sure to kill the perfs by busy-looping in that while loop (which is mindboggling).

因此,除了无缘无故丢失一个线程之外,它还确保通过在该 while 循环中忙循环(这令人难以置信)来杀死性能。

Hence my question: do you know of any correctly multithreaded quicksort or mergesort implementation in Java that would be coming from a reputable source?

因此我的问题是:您是否知道 Java 中任何正确的多线程快速排序或合并排序实现都来自信誉良好的来源?

I put the emphasis on the fact that I know that the complexity stays O(n log n) but I'd still enjoy very much to see all these cores start working instead of idling. Note that for other tasks, on that same 16 virtual cores Mac, I saw speedup of up to x7 by parallelizing the code (and I'm by no mean an expert in concurrency).

我强调这样一个事实,即我知道复杂性保持为 O(n log n),但我仍然很高兴看到所有这些内核开始工作而不是空闲。请注意,对于其他任务,在相同的 16 个虚拟内核 Mac 上,通过并行化代码,我看到了高达 x7 的加速(而且我绝不是并发方面的专家)。

So even tough the complexity stays O(n log n), I'd really appreciate a x7 or x8 or even x16 speedup.

所以即使复杂度保持 O(n log n),我真的很感激 x7 或 x8 甚至 x16 加速。

回答by dfa

give a try to fork/join framework by Doug Lea:

尝试分叉/加入 Doug Lea 框架

public class MergeSort extends RecursiveAction {
    final int[] numbers;
    final int startPos, endPos;
    final int[] result;

    private void merge(MergeSort left, MergeSort right) {
        int i=0, leftPos=0, rightPos=0, leftSize = left.size(), rightSize = right.size();
        while (leftPos < leftSize && rightPos < rightSize)
            result[i++] = (left.result[leftPos] <= right.result[rightPos])
                ? left.result[leftPos++]
                : right.result[rightPos++];
        while (leftPos < leftSize)
            result[i++] = left.result[leftPos++];
        while (rightPos < rightSize)
        result[i++] = right.result[rightPos++];
    }

    public int size() {
        return endPos-startPos;
    }

    protected void compute() {
        if (size() < SEQUENTIAL_THRESHOLD) {
            System.arraycopy(numbers, startPos, result, 0, size());
            Arrays.sort(result, 0, size());
        } else {
            int midpoint = size() / 2;
            MergeSort left = new MergeSort(numbers, startPos, startPos+midpoint);
            MergeSort right = new MergeSort(numbers, startPos+midpoint, endPos);
            coInvoke(left, right);
            merge(left, right);
        }
    }
}

(source: http://www.ibm.com/developerworks/java/library/j-jtp03048.html?S_TACT=105AGX01&S_CMP=LP)

(来源:http: //www.ibm.com/developerworks/java/library/j-jtp03048.html?S_TACT=105AGX01&S_CMP=LP

回答by Fabian Steeg

You probably did consider this, but it might help to look at the concrete problem from a higher level, for example if you don't sort just one array or list it might be much easier to sort individual collections concurrently using the traditional algorithm instead of trying to concurrently sort a single collection.

您可能确实考虑过这一点,但从更高的层次看待具体问题可能会有所帮助,例如,如果您不只对一个数组或列表进行排序,则使用传统算法对单个集合进行并发排序可能会容易得多,而不是尝试同时对单个集合进行排序。

回答by Stephan Eggermont

Why do you think a parallel sort would help? I'd think most sorting is i/o bound, not processing. Unless your compare does a lot of calculations, a speedup is unlikely.

为什么你认为并行排序会有所帮助?我认为大多数排序是 I/O 绑定,而不是处理。除非您的比较进行了大量计算,否则不太可能实现加速。

回答by medv4380

Sorry about this but what you are asking for isn't possible. I believe someone else mentioned that sorting is IO bound and they are most likely correct. The code from IBM by Doug Lea is a nice piece of work but I believe it is intended mostly as an example on how to write code. If you notice in his article he never posted the benchmarks for it and instead posted benchmarks for other working code such as calculating averages and finding the min max in parallel. Here is what the benchmarks are if you use a generic Merge Sort, Quick Sort, Dougs Merge Sort using a Join Fork Pool, and one that I wrote up using a Quick Sort Join Fork Pool. You'll see that Merge Sort is the best for an N of 100 or less. Quick Sort for 1000 to 10000 and the Quick Sort using a Join Fork Pool beats the rest if you have 100000 and higher. These tests were of arrays of random number running 30 time to create an average for each data point and were running on a quad core with about 2 gigs of ram. And below I have the code for the Quick Sort. This mostly shows that unless you're trying to sort a very large array you should back away from trying to improve your codes sort algorithm since the parallel ones run very slow on small N's.

对此很抱歉,但您所要求的是不可能的。我相信其他人提到排序是 IO 绑定的,他们很可能是正确的。Doug Lea 来自 IBM 的代码是一项不错的工作,但我相信它主要用作如何编写代码的示例。如果您在他的文章中注意到,他从未发布过它的基准测试,而是发布了其他工作代码的基准测试,例如计算平均值和并行查找最小最大值。如果您使用通用合并排序、快速排序、Dougs Merge Sort 使用 Join Fork Pool,以及我使用 Quick Sort Join Fork Pool 编写的一个,那么这里是基准。您会看到合并排序最适合 N 为 100 或更少的情况。1000 到 10000 的快速排序,如果您有 100000 和更高的数量,则使用 Join Fork Pool 的快速排序会胜过其余部分。这些测试是随机数阵列运行 30 次以创建每个数据点的平均值,并在具有大约 2 g 内存的四核上运行。下面我有快速排序的代码。这主要表明,除非您尝试对非常大的数组进行排序,否则您应该放弃尝试改进代码排序算法,因为并行算法在小 N 上运行速度非常慢。

Merge Sort
10  7.51E-06
100 1.34E-04
1000    0.003286269
10000   0.023988694
100000  0.022994328
1000000 0.329776132


Quick Sort
5.13E-05
1.60E-04
7.20E-04
9.61E-04
0.01949271
0.32528383


Merge TP
1.87E-04
6.41E-04
0.003704411
0.014830678
0.019474009
0.19581768

Quick TP
2.28E-04
4.40E-04
0.002716065
0.003115251
0.014046681
0.157845389

import jsr166y.ForkJoinPool;
import jsr166y.RecursiveAction;

//  derived from
//  http://www.cs.princeton.edu/introcs/42sort/QuickSort.java.html
//  Copyright ? 2007, Robert Sedgewick and Kevin Wayne.
//  Modified for Join Fork by me hastily. 
public class QuickSort {

    Comparable array[];
    static int limiter = 10000;

    public QuickSort(Comparable array[]) {
        this.array = array;
    }

    public void sort(ForkJoinPool pool) {
        RecursiveAction start = new Partition(0, array.length - 1);        
        pool.invoke(start);
    }

    class Partition extends RecursiveAction {

        int left;
        int right;

        Partition(int left, int right) {
            this.left = left;
            this.right = right;
        }

        public int size() {
            return right - left;
        }

        @SuppressWarnings("empty-statement")
        //void partitionTask(int left, int right) {
        protected void compute() {
            int i = left, j = right;
            Comparable tmp;
            Comparable pivot = array[(left + right) / 2];

            while (i <= j) {
                while (array[i].compareTo(pivot) < 0) {
                    i++;
                }
                while (array[j].compareTo(pivot) > 0) {
                    j--;
                }

                if (i <= j) {
                    tmp = array[i];
                    array[i] = array[j];
                    array[j] = tmp;
                    i++;
                    j--;
                }
            }


            Partition leftTask = null;
            Partition rightTask = null;

            if (left < i - 1) {
                leftTask = new Partition(left, i - 1);
            }
            if (i < right) {
                rightTask = new Partition(i, right);
            }

            if (size() > limiter) {
                if (leftTask != null && rightTask != null) {
                    invokeAll(leftTask, rightTask);
                } else if (leftTask != null) {
                    invokeAll(leftTask);
                } else if (rightTask != null) {
                    invokeAll(rightTask);
                }
            }else{
                if (leftTask != null) {
                    leftTask.compute();
                }
                if (rightTask != null) {
                    rightTask.compute();
                }
            }
        }
    }
}

回答by Rob_before_edits

I've been facing the multithreaded sort problem myself the last couple of days. As explained on this caltech slidethe best you can do by simply multithreading each step of the divide and conquer approaches over the obvious number of threads (the number of divisions) is limited. I guess this is because while you can run 64 divisions on 64 threads using all 64 cores of your machine, the 4 divisions can only be run on 4 threads, the 2 on 2, and the 1 on 1, etc. So for many levels of the recursion your machine is under-utilized.

最近几天我自己一直在面对多线程排序问题。正如在这张加州理工学院幻灯片中所解释的那样通过简单地对分治法的每个步骤进行多线程处理,显然线程数(除法数)是有限的。我猜这是因为虽然您可以使用机器的所有 64 个内核在 64 个线程上运行 64 个分区,但 4 个分区只能在 4 个线程上运行,2 对 2 和 1 对 1 等。所以对于许多级别您的机器未充分利用的递归。

A solution occurred to me last night which might be useful in my own work, so I'll post it here.

昨晚我想到了一个可能对我自己的工作有用的解决方案,所以我会在这里发布。

Iff, the first criteria of your sorting function is based on an integer of maximum size s, be it an actual integer or a char in a string, such that this integer or char fully defines the highest level of your sort, then I think there's a very fast (and easy) solution. Simply use that initial integer to divide your sorting problem into s smaller sorting problems, and sort those using the standard single threaded sort algo of your choice. The division into s classes can be done in a single pass, I think. There is no merging problem after doing the s independent sorts, because you already know that everything in class 1 sorts before class 2, and so on.

如果,排序函数的第一个条件是基于最大大小 s 的整数,无论​​是实际整数还是字符串中的字符,这样该整数或字符就完全定义了排序的最高级别,那么我认为有一个非常快速(且简单)的解决方案。只需使用该初始整数将您的排序问题划分为多个较小的排序问题,然后使用您选择的标准单线程排序算法对这些问题进行排序。我认为可以一次完成划分为 s 个类。进行 s 独立排序后没有合并问题,因为您已经知道类 1 中的所有内容都在类 2 之前排序,依此类推。

Example : if you wish to do a sort based on strcmp(), then use the first char in your string to break your data into 256 classes, then sort each class on the next available thread until they're all done.

示例:如果您希望基于 strcmp() 进行排序,则使用字符串中的第一个字符将数据分成 256 个类,然后在下一个可用线程上对每个类进行排序,直到它们全部完成。

This method fully utilizes all available cores until the problem is solved, and I think it's easy to implement. I haven't implemented it yet though, so there may be problems with it that I have yet to find. It clearly cant work for floating point sorts, and would be inefficient for large s. Its performance would also be heavily dependent on the entropy of the integer/char used to define the classes.

这种方法充分利用了所有可用的内核,直到问题解决为止,我认为它很容易实现。我还没有实现它,所以它可能存在我尚未发现的问题。它显然不能用于浮点排序,并且对于大 s 效率低下。它的性能也将严重依赖于用于定义类的整数/字符的熵。

This may be what Fabian Steeg was suggesting in fewer words, but I'm making it explicit that you can create multiple smaller sorts from a larger sort in some circumstances.

这可能是 Fabian Steeg 用更少的话提出的建议,但我明确指出,在某些情况下,您可以从一个较大的排序中创建多个较小的排序。

回答by Graham Seed

Just coded up the above MergeSort and performance was very poor.

刚刚编写了上面的 MergeSort 并且性能很差。

The code block refers to "coInvoke(left, right);" but there was no reference to this and replaced it with invokeAll(left, right);

代码块指的是“coInvoke(left, right);” 但是没有提到这个,而是用 invokeAll(left, right);

Test code is:

测试代码为:

MergeSort mysort = new MyMergeSort(array,0,array.length);
ForkJoinPool threadPool = new ForkJoinPool();
threadPool.invoke(mysort);

but had to stop it due to poor performance.

但由于性能不佳而不得不停止它。

I see that the article above is almost a year old and maybe things have changed now.

我看到上面的文章已经快一年了,也许现在情况已经改变了。

I have found the code in the alternative article to work: http://blog.quibb.org/2010/03/jsr-166-the-java-forkjoin-framework/

我发现替代文章中的代码可以工作:http: //blog.quibb.org/2010/03/jsr-166-the-java-forkjoin-framework/

回答by Jeffrey Bosboom

Java 8 provides java.util.Arrays.parallelSort, which sorts arrays in parallel using the fork-join framework. The documentation provides some details about the current implementation (but these are non-normative notes):

Java 8 提供了java.util.Arrays.parallelSort,它使用 fork-join 框架对数组进行并行排序。该文档提供了有关当前实现的一些详细信息(但这些是非规范性注释):

The sorting algorithm is a parallel sort-merge that breaks the array into sub-arrays that are themselves sorted and then merged. When the sub-array length reaches a minimum granularity, the sub-array is sorted using the appropriate Arrays.sort method. If the length of the specified array is less than the minimum granularity, then it is sorted using the appropriate Arrays.sort method. The algorithm requires a working space no greater than the size of the original array. The ForkJoin common pool is used to execute any parallel tasks.

排序算法是一种并行排序合并,它将数组分解为子数组,这些子数组本身已排序然后合并。当子数组长度达到最小粒度时,使用适当的 Arrays.sort 方法对子数组进行排序。如果指定数组的长度小于最小粒度,则使用适当的 Arrays.sort 方法对其进行排序。该算法需要一个不大于原始数组大小的工作空间。ForkJoin 公共池用于执行任何并行任务。

There does not seem to be a corresponding parallel sort method for lists (even though RandomAccesslists should play nice with sorting), so you'll need to use toArray, sort that array, and store the result back into the list. (I've asked a question about this here.)

列表似乎没有相应的并行排序方法(尽管RandomAccess列表应该可以很好地进行排序),因此您需要使用toArray,对该数组进行排序,并将结果存储回列表中。(我在这里问过一个问题。)

回答by Prakash Devta

import java.util.Arrays;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;

public class IQ1 {
    public static void main(String[] args) {
        // Get number of available processors
        int numberOfProcessors = Runtime.getRuntime().availableProcessors();
        System.out.println("Number of processors : " + numberOfProcessors);
        // Input data, it can be anything e.g. log records, file records etc
        long[][] input = new long[][]{
              { 5, 8, 9, 14, 20 },
              { 17, 56, 59, 80, 102 },
              { 2, 4, 7, 11, 15 },
              { 34, 37, 39, 45, 50 }
            };

        /* A special thread pool designed to work with fork-and-join task splitting
         * The pool size is going to be based on number of cores available 
         */
        ForkJoinPool pool = new ForkJoinPool(numberOfProcessors);
        long[] result = pool.invoke(new Merger(input,  0, input.length));

        System.out.println(Arrays.toString(result));
    }
    /* Recursive task which returns the result
     * An instance of this will be used by the ForkJoinPool to start working on the problem
     * Each thread from the pool will call the compute and the problem size will reduce in each call
     */
    static class Merger extends RecursiveTask<long[]>{
        long[][] input;
        int low;
        int high;

        Merger(long[][] input, int low, int high){
            this.input = input;
            this.low = low;
            this.high = high;
        }

        @Override
        protected long[] compute() {            
            long[] result = merge();
            return result;
        }

        // Merge
        private long[] merge(){
            long[] result = new long[input.length * input[0].length];
            int i=0;
            int j=0;
            int k=0;
            if(high - low < 2){
                return input[0];
            }
            // base case
            if(high - low == 2){
                long[] a = input[low];
                long[] b = input[high-1];
                result = mergeTwoSortedArrays(a, b);
            }
            else{
                // divide the problem into smaller problems
                int mid = low + (high - low) / 2;
                Merger first = new Merger(input, low, mid);
                Merger second = new Merger(input, mid, high);
                first.fork();
                long[] secondResult = second.compute();
                long[] firstResult = first.join();

                result = mergeTwoSortedArrays(firstResult, secondResult);
            }

            return result;
        }

        // method to merge two sorted arrays
        private long[] mergeTwoSortedArrays(long[] a, long[] b){
            long[] result = new long[a.length + b.length];
            int i=0;
            int j=0;
            int k=0;
                while(i<a.length && j<b.length){
                    if(a[i] < b[j]){
                        result[k] = a[i];
                        i++;
                    } else{
                        result[k] = b[j];
                        j++;
                    }
                    k++;
                }

                while(i<a.length){
                    result[k] = a[i];
                    i++;
                    k++;
                }

                while(j<b.length){
                    result[k] = b[j];
                    j++;
                    k++;
                }

        return result;
    }
    }
}