C语言 快速排序和霍尔分区

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7198121/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 09:29:20  来源:igfitidea点击:

QuickSort and Hoare Partition

calgorithmsortingquicksortdata-partitioning

提问by Ofek Ron

I have a hard time translating QuickSort with Hoare partitioning into C code, and can't find out why. The code I'm using is shown below:

我很难将带有 Hoare 分区的 QuickSort 翻译成 C 代码,并且找不到原因。我正在使用的代码如下所示:

void QuickSort(int a[],int start,int end) {
    int q=HoarePartition(a,start,end);
    if (end<=start) return;
    QuickSort(a,q+1,end);
    QuickSort(a,start,q);
}

int HoarePartition (int a[],int p, int r) {
    int x=a[p],i=p-1,j=r;
    while (1) {
        do  j--; while (a[j] > x);
        do  i++; while (a[i] < x);

        if  (i < j)
            swap(&a[i],&a[j]);
        else
            return j;
    }
}

Also, I don't really get why HoarePartitionworks. Can someone explain why it works, or at least link me to an article that does?

另外,我真的不明白为什么HoarePartition有效。有人可以解释为什么它有效,或者至少将我链接到一篇有效的文章?

I have seen a step-by-step work-through of the partitioning algorithm, but I don't have an intuitive feel for it. In my code, it doesn't even seem to work. For example, given the array

我已经看到了分区算法的分步工作,但我对它没有直观的感觉。在我的代码中,它甚至似乎不起作用。例如,给定数组

13 19  9  5 12  8  7  4 11  2  6 21

It will use pivot 13, but end up with the array

它将使用枢轴 13,但最终会得到数组

 6  2  9  5 12  8  7  4 11 19 13 21 

And will return jwhich is a[j] = 11. I thought it was supposed to be true that the array starting at that point and going forward should have values that are all larger than the pivot, but that isn't true here because 11 < 13.

并将返回j哪个是a[j] = 11. 我认为从该点开始并向前的数组应该具有都大于枢轴的值应该是正确的,但这在这里不是真的,因为 11 < 13。

Here's pseudocode for Hoare partitioning (from CLRS, second edition), in case this is useful:

这是 Hoare 分区的伪代码(来自 CLRS,第二版),以防万一:

Hoare-Partition (A, p, r)
    x ← A[p]
    i ← p ? 1
    j ← r + 1
    while  TRUE
        repeat   j ←  j ? 1
            until     A[j] ≤ x
        repeat   i ←  i + 1
            until     A[i] ≥ x
        if  i < j
            exchange  A[i] ? A[j]
        else  return   j 

Thanks!

谢谢!

EDIT:

编辑:

The right C code for this problem will end up being:

这个问题的正确 C 代码最终将是:

void QuickSort(int a[],int start,int end) {
    int q;
    if (end-start<2) return;
    q=HoarePartition(a,start,end);
    QuickSort(a,start,q);
    QuickSort(a,q,end);
}

int HoarePartition (int a[],int p, int r) {
    int x=a[p],i=p-1,j=r;
    while (1) {
        do  j--; while (a[j] > x);
        do  i++; while (a[i] < x);
        if  (i < j) 
            swap(&a[i],&a[j]);
        else 
            return j+1;
    }
}

回答by afeldspar

To answer the question of "Why does Hoare partitioning work?":

要回答“为什么 Hoare 分区有效?”的问题:

Let's simplify the values in the array to just three kinds: Lvalues (those less than the pivot value), Evalues (those equal to the pivot value), and Gvalue (those larger than the pivot value).

让我们将数组中的值简化为三种: L值(小于枢轴值的值)、E值(等于枢轴值的值)和G值(大于枢轴值的值)。

We'll also give a special name to one location in the array; we'll call this location s, and it's the location where the jpointer is when the procedure finishes. Do we know ahead of time which location sis? No, but we know that somelocation will meet that description.

我们还将为数组中的一个位置指定一个特殊名称;我们将此位置称为s,它是过程完成时j指针所在的位置。我们是否提前知道s是哪个位置?不,但我们知道某些位置将符合该描述。

With these terms, we can express the goal of the partitioning procedure in slightly different terms: it is to split a single array into two smaller sub-arrays which are not mis-sortedwith respect to each other. That "not mis-sorted" requirement is satisfied if the following conditions are true:

有了这些术语,我们可以用稍微不同的术语来表达分区过程的目标:将单个数组拆分为两个较小的子数组,这些子数组彼此之间没有错误排序。如果以下条件为真,则满足“未排序错误”的要求:

  1. The "low" sub-array, that goes from the left end of the array up to and includes s, contains no Gvalues.
  2. The "high" sub-array, that starts immediately after sand continues to the right end, contains no Lvalues.
  1. “低”子数组,从数组的左端一直到并包含s,不包含G值。
  2. “高”子数组,紧接在s之后开始并一直到右端,不包含L值。

That's really all we need to do. We don't even need to worry where the Evalues wind up on any given pass. As long as each pass gets the sub-arrays right with respect to each other, later passes will take care of any disorder that exists inside any sub-array.

这就是我们真正需要做的。我们甚至不需要担心E值在任何给定的传递中结束。只要每次传递都使子数组彼此正确,以后的传递将处理任何子数组中存在的任何无序。

So now let's address the question from the other side: how does the partitioning procedure ensure that there are no Gvalues in sor to the left of it, and no Lvalues to the right of s?

所以,现在让我们来解决从对方的问题:如何划分方法确保没有小号或向左的它,也没有大号值的权小号

Well, "the set of values to the right of s" is the same as "the set of cells the jpointer moves over before it reaches s". And "the set of values to the left of and including s" is the same as "the set of values that the ipointer moves over before jreaches s".

那么,“ s右侧的值集”与“ j指针在到达s之前移动的单元格集”相同。并且“在s左侧并包括s的值集”与“在j到达s之前i指针移动的值集”相同。

That means that any values which aremisplaced will, on some iteration of the loop, be under one of our two pointers. (For convenience, let's say it's the jpointer pointing at a Lvalue, though it works exactly the same for the ipointer pointing at a Gvalue.) Where will the ipointer be, when the jpointer is on a misplaced value? We know it will be:

这意味着在循环的某些迭代中,任何错位的值将位于我们的两个指针之一之下。(为方便起见,我们说这是Ĵ指针在指向大号价值,但它的工作原理完全为同一指针在指向值。)将在哪里指针,当Ĵ指针是一个错位的价值?我们知道它将是:

  1. at a location in the "low" subarray, where the Lvalue can go with no problems;
  2. pointing at a value that's either an Eor a Gvalue, which can easily replace the Lvalue under the jpointer. (If it wasn't on an Eor a Gvalue, it wouldn't have stopped there.)
  1. 在“低”子数组中的某个位置,L值可以毫无问题地进入;
  2. 指向一个EG值的值,它可以很容易地替换j指针下的L值。(如果它不在EG值上,它就不会停在那里。)

Note that sometimes the iand jpointer will actually both stop on Evalues. When this happens, the values will be switched, even though there's no need for it. This doesn't do any harm, though; we said before that the placement of the Evalues can't cause mis-sorting between the sub-arrays.

请注意,有时ij指针实际上都会停在E值上。发生这种情况时,即使不需要,也会切换值。不过,这没有任何害处。我们之前说过,E值的放置不会导致子数组之间的错误排序。

So, to sum up, Hoare partitioning works because:

因此,总而言之,Hoare 分区之所以有效,是因为:

  1. It separates an array into smaller sub-arrays which are not mis-sorted relative to each other;
  2. If you keep doing that and recursively sorting the sub-arrays, eventually there will be nothing left of the array that's unsorted.
  1. 它将数组分成较小的子数组,这些子数组彼此之间没有错误排序;
  2. 如果你继续这样做并递归地对子数组进行排序,最终数组中将没有任何未排序的东西。

回答by templatetypedef

I believe that there are two problems with this code. For starters, in your Quicksort function, I think you want to reorder the lines

我认为这段代码有两个问题。对于初学者,在您的 Quicksort 功能中,我认为您想重新排列行

 int q=HoarePartition(a,start,end);
 if (end<=start) return;

so that you have them like this:

这样你就可以像这样拥有它们:

 if (end<=start) return;
 int q=HoarePartition(a,start,end);

However, you should do even more than this; in particular this should read

然而,你应该做的不止这些;特别是这应该读

 if (end - start < 2) return;
 int q=HoarePartition(a,start,end);

The reason for this is that the Hoare partition fails to work correctly if the range you're trying to partition has size zero or one. In my edition of CLRS this isn't mentioned anywhere; I had to go to the book's errata pageto find this. This is almost certainly the cause of the problem you encountered with the "access out of range" error, since with that invariant broken you might run right off the array!

这样做的原因是,如果您尝试分区的范围大小为零或一,Hoare 分区将无法正常工作。在我的 CLRS 版本中,任何地方都没有提到这一点;我不得不去这本书的勘误页才能找到这个。这几乎肯定是您遇到“访问超出范围”错误的原因,因为如果不变量被破坏,您可能会立即离开数组!

As for an analysis of Hoare partitioning, I would suggest starting off by just tracing through it by hand. There's also a more detailed analysis here. Intuitively, it works by growing two ranges from the ends of the range toward one another - one on the left-hand side containing elements smaller than the pivot and one on the right-hand side containing elements larger than the pivot. This can be slightly modified to produce the Bentley-McIlroy partitioning algorithm (referenced in the link) that scales nicely to handle equal keys.

至于对 Hoare 分区的分析,我建议从手工跟踪开始。还有一个更详细的分析在这里。直观地说,它的工作原理是从范围的末端向彼此增加两个范围 - 一个在左侧包含小于枢轴的元素,另一个在右侧包含大于枢轴的元素。这可以稍微修改以生成 Bentley-McIlroy 分区算法(在链接中引用),该算法可以很好地扩展以处理相等的键。

Hope this helps!

希望这可以帮助!

回答by VMatrix1900

Your final code is wrong, since the initial value of jshould be r + 1instead of r. Otherwise your partition function always ignore the last value.

您的最终代码是错误的,因为 的初始值j应该是r + 1而不是r。否则你的分区函数总是忽略最后一个值。

Actually, HoarePartition works because for any array A[p...r]which contains at least 2 elements(i.e. p < r), every element of A[p...j]is <=every element of A[j+1...r]when it terminates. So the next two segments that the main algorithm recurs on are [start...q]and [q+1...end]

实际上,HoarePartition 之所以有效是因为对于任何A[p...r]包含至少 2 个元素(即p < r)的数组,每个元素A[p...j]都是它终止时的<=每个元素A[j+1...r]。因此,主要算法重复出现的接下来的两个部分是[start...q][q+1...end]

So the right C code is as follows:

所以正确的C代码如下:

void QuickSort(int a[],int start,int end) {
    if (end <= start) return;
    int q=HoarePartition(a,start,end);
    QuickSort(a,start,q);
    QuickSort(a,q + 1,end);
}

int HoarePartition (int a[],int p, int r) {
    int x=a[p],i=p-1,j=r+1;
    while (1) {
        do  j--; while (a[j] > x);
        do  i++; while (a[i] < x);
        if  (i < j) 
            swap(&a[i],&a[j]);
        else 
            return j;
    }
}

More clarifications:

更多说明:

  1. partition part is just the translation of the pseudocode. (Note the return value is j)

  2. for the recursive part, note that the base case checking (end <= startinstead of end <= start + 1otherwise you will skip the [2 1]subarray )

  1. 分区部分只是伪代码的翻译。(注意返回值为j

  2. 对于递归部分,注意基本情况检查(end <= start而不是end <= start + 1否则你将跳过[2 1]子阵列)

回答by aman khunt

First of all u misunderstood the Hoare's partition algorithm,which can be see from the translated code in c, Since u considered pivot as leftmost element of subarray.

首先你误解了Hoare的分区算法,这可以从c中的翻译代码看出,因为你认为pivot是子数组的最左边的元素。

Ill explain u considering the leftmost element as pivot.

将最左边的元素作为枢轴来解释你。

int HoarePartition (int a[],int p, int r) 

Here p and r represents the lower and upper bound of array which can be part of a larger array also(subarray) to be partitioned.

这里 p 和 r 表示数组的下限和上限,数组也可以是要分区的更大数组(子数组)的一部分。

so we start with the pointers(marker) initially pointing to before and after end points of array(simply bcoz using do while loop).Therefore,

所以我们从最初指向数组端点前后的指针(标记)开始(简单地 bcoz 使用 do while 循环)。因此,

i=p-1,

j=r+1;    //here u made mistake

Now as per partitioning we want every element to the left of pivot to be less than or equal to pivot and greater than on right side of pivot.

现在根据分区,我们希望枢轴左侧的每个元素都小于或等于枢轴并大于枢轴右侧的元素。

So we will move 'i' marker untill we get element which is greaterthan or equal to pivot. And similarly 'j' marker untill we find element less than or equal to pivot.

所以我们将移动 'i' 标记,直到我们得到大于或等于枢轴的元素。和类似的 'j' 标记,直到我们找到小于或等于枢轴的元素。

Now if i < j we swap the elements bcoz both the elements are in wrong part of array. So code will be

现在如果 i < j 我们交换元素 bcoz 两个元素都在数组的错误部分。所以代码将是

do  j--; while (a[j] <= x);                 //look at inequality sign
do  i++; while (a[i] >= x);
if  (i < j) 
    swap(&a[i],&a[j]);

Now if 'i' is not less than 'j',that means now there is no element in between to swap so we return 'j' position.

现在如果 'i' 不小于 'j',这意味着现在中间没有元素可以交换,所以我们返回 'j' 位置。

So now the array after partitioned lower half is from 'start to j'

所以现在分区下半部分后的数组是从'开始到j'

upper half is from 'j+1 to end'

上半部分是从 'j+1 到 end'

so code will look like

所以代码看起来像

void QuickSort(int a[],int start,int end) {
    int q=HoarePartition(a,start,end);
    if (end<=start) return;
    QuickSort(a,start,q);
    QuickSort(a,q+1,end);
}

回答by Hrishikesh Mishra

Straightforward implementation in java.

在java中的直接实现。

public class QuickSortWithHoarePartition {

    public static void sort(int[] array) {
        sortHelper(array, 0, array.length - 1);
    }

    private static void sortHelper(int[] array, int p, int r) {
        if (p < r) {
            int q = doHoarePartitioning(array, p, r);
            sortHelper(array, p, q);
            sortHelper(array, q + 1, r);
        }
    }

    private static int doHoarePartitioning(int[] array, int p, int r) {
        int pivot = array[p];
        int i = p - 1;
        int j = r + 1;

        while (true) {

            do {
                i++;
            }
            while (array[i] < pivot);

            do {
                j--;
            }
            while (array[j] > pivot);

            if (i < j) {
                swap(array, i, j);
            } else {
                return j;
            }
        }

    }

    private static void swap(int[] array, int i, int j) {
        int temp = array[i];
        array[i] = array[j];
        array[j] = temp;
    }
}

回答by qeatzy

  1. move pivot to first. (eg, use median of three. switch to insertion sort for small input size.)
  2. partition,
    • repetitively swap currently leftmost 1 with currently rightmost 0.
      0 -- cmp(val, pivot) == true, 1 -- cmp(val, pivot) == false.
      stop if not left < right.
    • after that, swap pivot with rightmost 0.
  1. 将枢轴移至第一个。(例如,使用 3 的中位数。对于小输入大小,切换到插入排序。)
  2. 划分,
    • 将当前最左边的 1 与当前最右边的 0 重复交换。
      0 -- cmp(val, pivot) == true, 1 -- cmp(val, pivot) == false。
      如果不是左 < 右,则停止。
    • 之后,将枢轴与最右边的 0 交换。

回答by xxx7xxxx

You last C code works. But it's not intuitive. And now I'm studying CLRS luckily. In my opinion, The pseudocode of CLRS is wrong.(At 2e) At last, I find that it would be right if changing a place.

你最后的 C 代码有效。但这并不直观。现在我很幸运地正在学习 CLRS。在我看来,CLRS的伪代码是错误的。(2e)最后我发现换个地方就对了。

 Hoare-Partition (A, p, r)
 x ← A[p]
     i ← p ? 1
     j ← r + 1
 while  TRUE
        repeat   j ←  j ? 1
            until     A[j] ≤ x
    repeat   i ←  i + 1
            until     A[i] ≥ x
    if  i < j
              exchange  A[i] ? A[j]
    else  
              exchnage  A[r] ? A[i]  
              return   i

Yes, Add a exchange A[r] ? A[i] can make it works. Why? Because A[i] is now bigger than A[r] OR i == r. So We must exchange to guarantee the feature of a partition.

是的,添加一个交换 A[r] ?A[i] 可以让它工作。为什么?因为 A[i] 现在大于 A[r] OR i == r。所以我们必须交换来保证一个分区的特性。