从 Java 数组中获取前四个最大值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14122526/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting Top Four Maximum value from Java Array
提问by Jon Kartago Lamida
I am trying to find top 4 maximum value from integer array input. For example for given input array {1232, -1221, 0, 345, 78, 99} will return {1232, 345, 99, 78} as a top 4 maximum value. I have solved the requirement with following method below. But I am still not satisfy with its time efficiency. Is there any chance to optimize the method more as the input become larger? Any clues are really appreciated. Thank you.
我试图从整数数组输入中找到前 4 个最大值。例如,对于给定的输入数组 {1232, -1221, 0, 345, 78, 99} 将返回 {1232, 345, 99, 78} 作为前 4 个最大值。我已经用下面的方法解决了这个要求。但我仍然不满意它的时间效率。随着输入变大,有没有机会进一步优化方法?任何线索真的很感激。谢谢你。
public int[] findTopFourMax(int[] input) {
int[] topFourList = { Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE };
for (int current : input) {
if (current > topFourList[0]) {
topFourList[3] = topFourList[2];
topFourList[2] = topFourList[1];
topFourList[1] = topFourList[0];
topFourList[0] = current;
} else if (current > topFourList[1]) {
topFourList[3] = topFourList[2];
topFourList[2] = topFourList[1];
topFourList[1] = current;
} else if (current > topFourList[2]) {
topFourList[3] = topFourList[2];
topFourList[2] = current;
} else if (current > topFourList[3]) {
topFourList[3] = current;
}
}
return topFourList;
}
}
回答by amit
Simplest(though not most efficient) way will be to sort the array at take the subarraycontaining the last 4 elements.
最简单(虽然不是最有效)的方法是在包含最后 4 个元素的子数组中对数组进行排序。
You can use Arrays.sort()
to sort and Arrays.copyOfRange()
to take the subarray.
您可以使用Arrays.sort()
to sort 和Arrays.copyOfRange()
to take the subarray。
int[] arr = new int[] {1232, -1221, 0, 345, 78, 99};
Arrays.sort(arr);
int[] top4 = Arrays.copyOfRange(arr, arr.length-4,arr.length);
System.out.println(Arrays.toString(top4));
For more efficient solution, one can maintain a min-heapof top K elements or use selection algorithmto find the top 4th element. The two approaches are described in this thread.
为了更有效的解决方案,可以维护一个前 K 个元素的最小堆或使用选择算法找到前 4 个元素。这两种方法在此线程中进行了描述。
Though the selection algorithm offers O(n)
solution, the min-heap solution (which is O(nlogK)
) should have better constants, and especially for small k
is likely to be faster.
尽管选择算法提供了O(n)
解决方案,但最小堆解决方案(即O(nlogK)
)应该具有更好的常数,尤其是对于小堆k
可能更快。
P.S. (EDIT):
PS(编辑):
For 4 elements, you might find that invoking a loop 4 times, and finding a max in each of them (and changing the old max to -infinity in each iteration) will be more efficient then the more "complex" approaches, since it requires sequential reads and have fairly small constants. This is of course not true for larger k
, and decays into O(n^2)
for k->n
对于 4 个元素,您可能会发现调用循环 4 次,并在每个循环中找到最大值(并在每次迭代中将旧的最大值更改为 -infinity)将比更“复杂”的方法更有效,因为它需要顺序读取并且具有相当小的常量。对于较大的k
,这当然不是真的,并且会衰变为O(n^2)
fork->n
EDIT2: benchmarking:
EDIT2:基准测试:
for the fun of it, I ran a benchmark on the attached code. The results are:
为了好玩,我对附加的代码运行了一个基准测试。结果是:
[naive, sort, heap] = [9032, 214902, 7531]
We can see that the naive and heap are much better then the sort based approach, and the naive is slightly slower then the heap based. I did a wilcoxon testto check if the difference between naive and heap is statistically significant, and I got a P_Value of 3.4573e-17
. This means that the probability of the two approaches are "identical" is 3.4573e-17 (extremely small). From this we can conclude - heap based solution gives better performance then naive and sorting solution(and we empirically proved it!).
我们可以看到 naive 和 heap 比基于排序的方法要好得多,naive 比基于堆的方法稍慢。我做了一个wilcoxon 测试来检查 naive 和 heap 之间的差异在统计上是否显着,我得到了3.4573e-17
. 这意味着两种方法“相同”的概率为 3.4573e-17(极小)。由此我们可以得出结论——基于堆的解决方案提供了比朴素和排序解决方案更好的性能(我们凭经验证明了这一点!)。
Attachment: The code I used:
附:我使用的代码:
public static int[] findTopKNaive(int[] arr, int k) {
int[] res = new int[k];
for (int j = 0; j < k; j++) {
int max=Integer.MIN_VALUE, maxIdx = -1;
for (int i = 0; i < arr.length; i++) {
if (max < arr[i]) {
max = arr[i];
maxIdx = i;
}
}
arr[maxIdx] = Integer.MIN_VALUE;
res[k-1-j] = max;
}
return res;
}
public static int[] findTopKSort(int[] arr, int k) {
Arrays.sort(arr);
return Arrays.copyOfRange(arr, arr.length-k,arr.length);
}
public static int[] findTopKHeap(int[] arr, int k) {
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
for (int x : arr) {
if (pq.size() < k) pq.add(x);
else if (pq.peek() < x) {
pq.poll();
pq.add(x);
}
}
int[] res = new int[k];
for (int i =0; i < k; i++) res[i] = pq.poll();
return res;
}
public static int[] createRandomArray(int n, Random r) {
int[] arr = new int[n];
for (int i = 0; i < n; i++) arr[i] = r.nextInt();
return arr;
}
public static void main(String... args) throws Exception {
Random r = new Random(1);
int k = 4;
int repeats = 200;
int n = 5000000;
long[][] results = new long[3][repeats];
for (int i = 0; i < repeats; i++) {
int[] arr = createRandomArray(n, r);
int[] myCopy;
myCopy = Arrays.copyOf(arr, n);
long start = System.currentTimeMillis();
findTopKNaive(myCopy, k);
results[0][i] = System.currentTimeMillis() - start;
myCopy = Arrays.copyOf(arr, n);
start = System.currentTimeMillis();
findTopKSort(myCopy, k);
results[1][i] = System.currentTimeMillis() - start;
myCopy = Arrays.copyOf(arr, n);
start = System.currentTimeMillis();
findTopKHeap(myCopy, k);
results[2][i] = System.currentTimeMillis() - start;
}
long[] sums = new long[3];
for (int i = 0; i < repeats; i++)
for (int j = 0; j < 3; j++)
sums[j] += results[j][i];
System.out.println(Arrays.toString(sums));
System.out.println("results for statistic test:");
for (int i = 0; i < repeats; i++) {
System.out.println(results[0][i] + " " + results[2][i]);
}
}
回答by Marko Topolnik
You should check out this answer by Peter Lawrey. Basically, the idea is to run through your array, adding each element to a SortedSet
and maintaining the size at four by removing the least element in each iteration. This process is O(n), even in the worst case, compared with O(n logn) typical and O(n2) worst case for fully sorting an array.
您应该查看Peter Lawrey 的这个答案。基本上,这个想法是遍历你的数组,将每个元素添加到 aSortedSet
并通过在每次迭代中删除最少的元素来保持大小为 4。与完全排序数组的O(n logn) 典型和 O(n 2) 最坏情况相比,即使在最坏的情况下,这个过程也是 O(n) 。
final List<Integer> input = new ArrayList(Arrays.asList(1232, -1221, 0, 345, 78, 99));
final NavigableSet<Integer> topFour = new TreeSet<>();
for (int i : input) {
topFour.add(i);
if (topFour.size() > 4) topFour.remove(topFour.first());
}
System.out.println(topFour);
回答by Rahul
Sort: sort the array and take the last four elements
Sort: 对数组进行排序并取最后四个元素
Min Heap :The simplest solution for this is maintaining a min heapof max size 4.
最小堆:最简单的解决方案是维护最大大小为 4的最小堆。
This solution is O(nlogk) complexity, where n is the number of elements and k is the number of elements you need.
此解决方案的复杂度为 O(nlogk),其中 n 是元素数,k 是您需要的元素数。
Priority Queue: you can create a PriorityQueue
with a fixed size and a custom comparator as explained in this questionwith implementation.
Priority Queue:您可以创建一个PriorityQueue
具有固定大小的自定义比较器,如this questionwith implementation中所述。
Selection Algorithm :you can use selection algorithm, you can find the (n-k)th maximum element and then return all the elements which are higher than this element but it is harder to implement. Best case complexity : O(n)
选择算法:您可以使用选择算法,您可以找到第(nk)个最大元素,然后返回所有高于该元素但更难实现的元素。最佳情况复杂度:O(n)
回答by assylias
The easiest way is to sort the array and take the first/last 4 elements.
最简单的方法是对数组进行排序并取前/后 4 个元素。
In the end, the max 4 entries can be anywhere, so whatever you do, you need to read the whole array and it will be an O(n) operation.
最后,最多 4 个条目可以在任何地方,所以无论你做什么,你都需要读取整个数组,这将是一个 O(n) 操作。
回答by pcalcao
The mentions before about sorting the array truly provide the easiest way, but not really the most efficient.
前面提到的数组排序确实提供了最简单的方法,但并不是最有效的方法。
A variation on QuickSort (Quickselect), can be used to find the kth largest/smallest value in a collection.
QuickSort (Quickselect) 的一种变体,可用于查找集合中的第 k 个最大/最小值。
http://en.wikipedia.org/wiki/Selection_algorithm
http://en.wikipedia.org/wiki/Selection_algorithm
A correct implementation allows you to get the kth largest in O(n) time.
正确的实现允许您在 O(n) 时间内获得第 k 个最大值。
Basically you partition like in quicksort using a pivot, and compare the pivot position after each iteration with the position you want (four in your case), if it's equal, return the position, otherwise, apply the algorithm to the correct half of the input.
基本上,您像使用枢轴一样在快速排序中进行分区,并将每次迭代后的枢轴位置与您想要的位置(在您的情况下为四个)进行比较,如果相等,则返回位置,否则,将算法应用于输入的正确一半.
When you've found the index of the kth largest value, you can simply iterate over the array again and get the values inferior to the input[k]
.
当您找到第 k 个最大值的索引时,您可以简单地再次遍历数组并获得低于input[k]
.
This might be overkill for your case, since you need exactly four, but it's the most generic way of doing this.
对于您的情况,这可能有点矫枉过正,因为您正好需要四个,但这是最通用的方法。
If you don't care about memory too much, you can also use a Bounded PriorityQueue that keeps the top/bottom X values, and simply insert everything in the Queue. The ones that remain are the values you're interested in.
如果你不太关心内存,你也可以使用一个 Bounded PriorityQueue 来保留 top/bottom X 值,只需将所有内容插入 Queue。剩下的就是你感兴趣的值。
回答by user3916211
float a[] = {1.0f,3.0f,5.0f,6.0f,7.0f,10.0f,11.0f,3.2f,4.0f};
float first =0.0f;
float second=0.0f;
float third =0.0f;
for (int i=0; i<a.length; i++){
if(first < a[i]){
first=a[i];
}
}
System.out.println("first largest is "+first);
for (int j=0; j<a.length; j++){
if(a[j] <first && a[j] > second){
second = a[j];
}
}
System.out.println("second largest is "+second);
for (int k=0;k<a.length; k++){
if(a[k]<second && a[k]>third){
third =a[k];
}
}
System.out.println("third largest is "+third);