Java中的中位数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1790360/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 22:40:05  来源:igfitidea点击:

Median of Medians in Java

javaalgorithmsortingmedian

提问by

I am trying to implement Median of Medians in Java for a method like this:

我正在尝试在 Java 中为这样的方法实现 Median of Medians:

Select(Comparable[] list, int pos, int colSize, int colMed)
  • listis a list of values of which to find a specified position
  • posis the specified position
  • colSizeis the size of the columns that I create in the first stage
  • colMedis the position in those columns that I use as the medX
  • list是要查找指定位置的值的列表
  • pos是指定位置
  • colSize是我在第一阶段创建的列的大小
  • colMed是我用作 medX 的那些列中的位置

I am not sure which sorting algorithm would be the best to use or how to implement this exactly..

我不确定哪种排序算法最适合使用或如何准确地实现这一点..

回答by Chip Uni

I don't know if you still need this problem solved, but http://www.ics.uci.edu/~eppstein/161/960130.htmlhas an algorithm:

我不知道你是否还需要解决这个问题,但是http://www.ics.uci.edu/~eppstein/161/960130.html有一个算法:

select(L,k)
{
    if (L has 10 or fewer elements)
    {
        sort L
        return the element in the kth position
    }

    partition L into subsets S[i] of five elements each
        (there will be n/5 subsets total).

    for (i = 1 to n/5) do
        x[i] = select(S[i],3)

    M = select({x[i]}, n/10)

    partition L into L1<M, L2=M, L3>M
    if (k <= length(L1))
        return select(L1,k)
    else if (k > length(L1)+length(L2))
        return select(L3,k-length(L1)-length(L2))
    else return M
}

Good luck!

祝你好运!

回答by eold

I agree with the answer/solution from Chip Uni. I will just comment the sorting part and provide some further explanations:

我同意 Chip Uni 的答案/解决方案。我只会评论排序部分并提供一些进一步的解释:

You do not need any sorting algorithm. The algorithm is similar to quicksort, with the difference that only one partition is solved (left or right). We just need to find an optimal pivot so that left and right parts are as equal as possible, which would mean N/2 + N/4 + N/8 ... = 2N iterations, and thus the time complexity of O(N). The above algorithms, called median of medians, computes the median of medians of 5, which turns out to yield linear time complexity of the algorithm.

您不需要任何排序算法。该算法类似于快速排序,不同之处在于只解决了一个分区(左或右)。我们只需要找到一个最佳主元,使左右部分尽可能相等,这意味着 N/2 + N/4 + N/8 ... = 2N 次迭代,因此时间复杂度为 O(N )。上述算法称为中位数的中位数,计算 5 的中位数,结果是算法的线性时间复杂度。

However, sorting algorithm is used when the range being searched for nth smallest/greatest element (which I suppose you are implementing with this algorithm) in order to speed up the algorithm. Insertion sort is particularly fast on small arrays up to 7 to 10 elements.

但是,在搜索范围内第 n 个最小/最大元素(我想您正在使用此算法实现)时使用排序算法以加快算法速度。插入排序在最多 7 到 10 个元素的小数组上特别快。

Implementation note:

实施说明:

M = select({x[i]}, n/10)

actually means taking the median of all those medians of 5-element groups. You can accomplish that by creating another array of size (n - 1)/5 + 1and call the same algorithm recursively to find the n/10-th element (which is median of the newly created array).

实际上意味着取 5 个元素组的所有这些中位数的中位数。您可以通过创建另一个大小的数组(n - 1)/5 + 1并递归调用相同的算法来找到第 n/10 个元素(这是新创建的数组的中值)来实现这一点。

回答by chepukha

I know it's a very old post and you might not remember about it any more. But I wonder did you measure the running time of your implementation when you implemented it?

我知道这是一篇很老的帖子,你可能已经不记得了。但是我想知道您在实施时是否测量了实施的运行时间?

I tried this algorithm and compare it with the simple approach using java sorting method (Arrays.sort() ), then pick the kth element from sorted array. The result that I received is that this algorithm only out-beat java sorting algorithm when the size of the array is about hundred thousand elements or more. And it's only about 2 or 3 times faster, which is obviously not log(n) time faster.

我尝试了这个算法,并将它与使用 java 排序方法(Arrays.sort())的简单方法进行比较,然后从排序数组中选择第 k 个元素。我得到的结果是,当数组的大小约为十万个元素或更多时,该算法仅胜过 java 排序算法。而且它只快了大约 2 到 3 倍,这显然不是 log(n) 时间快。

Do you have any comment on that?

你对此有何评论?

回答by Adam Gawne-Cain

The question asked for Java, so here it is

这个问题是针对 Java 提出的,所以这里是

import java.util.*;

public class MedianOfMedians {
    private MedianOfMedians() {

    }

    /**
     * Returns median of list in linear time.
     * 
     * @param list list to search, which may be reordered on return
     * @return median of array in linear time.
     */
    public static Comparable getMedian(ArrayList<Comparable> list) {
        int s = list.size();
        if (s < 1)
            throw new IllegalArgumentException();
        int pos = select(list, 0, s, s / 2);
        return list.get(pos);
    }

    /**
     * Returns position of k'th largest element of sub-list.
     * 
     * @param list list to search, whose sub-list may be shuffled before
     *            returning
     * @param lo first element of sub-list in list
     * @param hi just after last element of sub-list in list
     * @param k
     * @return position of k'th largest element of (possibly shuffled) sub-list.
     */
    public static int select(ArrayList<Comparable> list, int lo, int hi, int k) {
        if (lo >= hi || k < 0 || lo + k >= hi)
            throw new IllegalArgumentException();
        if (hi - lo < 10) {
            Collections.sort(list.subList(lo, hi));
            return lo + k;
        }
        int s = hi - lo;
        int np = s / 5; // Number of partitions
        for (int i = 0; i < np; i++) {
            // For each partition, move its median to front of our sublist
            int lo2 = lo + i * 5;
            int hi2 = (i + 1 == np) ? hi : (lo2 + 5);
            int pos = select(list, lo2, hi2, 2);
            Collections.swap(list, pos, lo + i);
        }

        // Partition medians were moved to front, so we can recurse without making another list.
        int pos = select(list, lo, lo + np, np / 2);

        // Re-partition list to [<pivot][pivot][>pivot]
        int m = triage(list, lo, hi, pos);
        int cmp = lo + k - m;
        if (cmp > 0)
            return select(list, m + 1, hi, k - (m - lo) - 1);
        else if (cmp < 0)
            return select(list, lo, m, k);
        return lo + k;
    }

    /**
     * Partition sub-list into 3 parts [<pivot][pivot][>pivot].
     * 
     * @param list
     * @param lo
     * @param hi
     * @param pos input position of pivot value
     * @return output position of pivot value
     */
    private static int triage(ArrayList<Comparable> list, int lo, int hi,
            int pos) {
        Comparable pivot = list.get(pos);
        int lo3 = lo;
        int hi3 = hi;
        while (lo3 < hi3) {
            Comparable e = list.get(lo3);
            int cmp = e.compareTo(pivot);
            if (cmp < 0)
                lo3++;
            else if (cmp > 0)
                Collections.swap(list, lo3, --hi3);
            else {
                while (hi3 > lo3 + 1) {
                    assert (list.get(lo3).compareTo(pivot) == 0);
                    e = list.get(--hi3);
                    cmp = e.compareTo(pivot);
                    if (cmp <= 0) {
                        if (lo3 + 1 == hi3) {
                            Collections.swap(list, lo3, lo3 + 1);
                            lo3++;
                            break;
                        }
                        Collections.swap(list, lo3, lo3 + 1);
                        assert (list.get(lo3 + 1).compareTo(pivot) == 0);
                        Collections.swap(list, lo3, hi3);
                        lo3++;
                        hi3++;
                    }
                }
                break;
            }
        }
        assert (list.get(lo3).compareTo(pivot) == 0);
        return lo3;
    }

}

Here is a Unit test to check it works...

这是一个单元测试来检查它是否有效......

import java.util.*;

import junit.framework.TestCase;

public class MedianOfMedianTest extends TestCase {
    public void testMedianOfMedianTest() {
        Random r = new Random(1);
        int n = 87;
        for (int trial = 0; trial < 1000; trial++) {
            ArrayList list = new ArrayList();
            int[] a = new int[n];
            for (int i = 0; i < n; i++) {
                int v = r.nextInt(256);
                a[i] = v;
                list.add(v);
            }
            int m1 = (Integer)MedianOfMedians.getMedian(list);
            Arrays.sort(a);
            int m2 = a[n/2];
            assertEquals(m1, m2);
        }
    }
}

However, the above code is too slow for practical use.

但是,上面的代码对于实际使用来说太慢了。

Here is a simpler way to get the k'th element that does not guarantee performance, but is much faster in practice:

这是获取第 k 个元素的更简单方法,它不保证性能,但在实践中要快得多:

/**
 * Returns position of k'th largest element of sub-list.
 * 
 * @param list list to search, whose sub-list may be shuffled before
 *            returning
 * @param lo first element of sub-list in list
 * @param hi just after last element of sub-list in list
 * @param k
 * @return position of k'th largest element of (possibly shuffled) sub-list.
 */
static int select(double[] list, int lo, int hi, int k) {
    int n = hi - lo;
    if (n < 2)
        return lo;

    double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot

    // Triage list to [<pivot][=pivot][>pivot]
    int nLess = 0, nSame = 0, nMore = 0;
    int lo3 = lo;
    int hi3 = hi;
    while (lo3 < hi3) {
        double e = list[lo3];
        int cmp = compare(e, pivot);
        if (cmp < 0) {
            nLess++;
            lo3++;
        } else if (cmp > 0) {
            swap(list, lo3, --hi3);
            if (nSame > 0)
                swap(list, hi3, hi3 + nSame);
            nMore++;
        } else {
            nSame++;
            swap(list, lo3, --hi3);
        }
    }
    assert (nSame > 0);
    assert (nLess + nSame + nMore == n);
    assert (list[lo + nLess] == pivot);
    assert (list[hi - nMore - 1] == pivot);
    if (k >= n - nMore)
        return select(list, hi - nMore, hi, k - nLess - nSame);
    else if (k < nLess)
        return select(list, lo, lo + nLess, k);
    return lo + k;
}

回答by Droid Teahouse

@android developer :

@android 开发人员:

for (i = 1 to n/5) do
    x[i] = select(S[i],3)

is really

是真的

for (i = 1 to ceiling(n/5) do
    x[i] = select(S[i],3)

with a ceiling function appropriate for your data(eg in java 2 doubles) This affects the median as well wrt simply taking n/10, but we are finding closest to the mean that occurs in the array, not the true mean. Another note is that S[i] may have fewer than 3 elements, so we want to find the median with respect to length; passing it into select with k=3 won't always work.( eg n =11, we have 3 subgroups 2 w 5, 1 w 1 element)

使用适合您的数据的上限函数(例如在 java 2 doubles 中)这也会影响中位数,只需取 n/10,但我们发现最接近数组中出现的平均值,而不是真正的平均值。另一个注意事项是 S[i] 可能少于 3 个元素,所以我们想找到关于长度的中位数;将它传递给 k=3 的 select 并不总是有效。(例如 n = 11,我们有 3 个子组 2 w 5, 1 w 1 元素)