Java 在两个未排序的数组中查找公共元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18476512/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 01:58:16  来源:igfitidea点击:

Find common elements in two unsorted array

java

提问by user1841492

I try to find a solution to this problem: I have two arrays A and B of integers (A and B can have different dimensions). I have to find the common elements in these two arrays. I have another condition: the maximum distance between the common elements is k. So, this is my solution. I think is correct:

我试图找到这个问题的解决方案:我有两个整数数组 A 和 B(A 和 B 可以有不同的维度)。我必须找到这两个数组中的公共元素。我有另一个条件:公共元素之间的最大距离是 k。所以,这是我的解决方案。我认为是正确的:

for (int i = 0; i<A.length; i++){
    for (int j=jlimit; (j<B.length) && (j <= ks); j++){
        if(A[i]==B[j]){
            System.out.println(B[j]);
            jlimit = j;
            ks = j+k;
        }//end if
    }
}

Is there a way to make a better solution? Any suggestions? Thanks in advance!

有没有办法做出更好的解决方案?有什么建议?提前致谢!

回答by GGrec

Although this would be a cheat, since it uses HashSets, it is pretty nice for a Java implementation of this algorithm. If you need the pseudocode for the algorithm, don't read any further.

虽然这会是一个作弊,因为它使用了HashSets,所以对于这个算法的 Java 实现来说是非常好的。如果您需要算法的伪代码,请不要再阅读。

Source and author in the JavaDoc. Cheers.

JavaDoc 中的来源和作者。干杯。

/**
 * @author Crunchify.com
 */
public class CrunchifyIntersection {

    public static void main(String[] args) {
         Integer[ ] arrayOne = { 1, 4, 5, 2, 7, 3, 9 };
         Integer[ ] arrayTwo = { 5, 2, 4, 9, 5 };

         Integer[ ] common = iCrunchIntersection.findCommon( arrayOne, arrayTwo );

         System.out.print( "Common Elements Between Two Arrays: " );       
         for( Integer entry : common ) {
              System.out.print( entry + " " );
         }
   }

   public static Integer[ ] findCommon( Integer[ ] arrayOne, Integer[ ] arrayTwo ) {

        Integer[ ] arrayToHash;
        Integer[ ] arrayToSearch;

        if( arrayOne.length < arrayTwo.length ) {
            arrayToHash = arrayOne;
            arrayToSearch = arrayTwo;
        } else {
            arrayToHash = arrayTwo;
            arrayToSearch = arrayOne;
        }

        HashSet<Integer> intersection = new HashSet<Integer>( );

        HashSet<Integer> hashedArray = new HashSet<Integer>( );
        for( Integer entry : arrayToHash ) {
            hashedArray.add( entry );
        }

        for( Integer entry : arrayToSearch ) {
            if( hashedArray.contains( entry ) ) {
                intersection.add( entry );
            }
        }

        return intersection.toArray( new Integer[ 0 ] );
    }
 }

回答by SJuan76

Given your explanation, I think the most direct approach is reading array A, putting all elements in a Set(setA), do the same with B (setB), and use the retainAllmethod to find the intersection of both sets (items that belong to both of the sets).

根据你的解释,我认为最直接的方法是读取数组A,将所有元素放入a Set(setA),对B(setB)做同样的处理,用该retainAll方法求两个集合的交集(属于两个集合的项)集)。

You will see that the k distanceis not used at all, but I see no way to use that condition that leads to code either faster or more maintenable. The solution I advocate works without enforcing that condition, so it works also when the condition is true (that is called "weakening the preconditions")

您会看到k distance根本没有使用 ,但我认为没有办法使用导致代码更快或更易于维护的条件。我提倡的解决方案在不强制执行该条件的情况下起作用,因此当条件为真时它也起作用(称为“弱化先决条件”)

回答by progrenhard

IMPLEMENT BINARY SEARCH AND QUICK SORT!

实现二进制搜索和快速排序!

this will lead to tons of code.... but the fastest result.

这将导致大量代码......但最快的结果。

You can sort the elements of the larger array with like quick sort which would lead to O(nlogn).

您可以使用类似的快速排序对较大数组的元素进行排序,这将导致 O(nlogn)。

then iterate through the smaller array for each value and do a binary search of that particular element in the other array. Add some logic for the distance in the binary search.

然后遍历每个值的较小数组,并对另一个数组中的特定元素进行二分搜索。为二分查找中的距离添加一些逻辑。

I think you can get the complexity down to O(nlogn). Worst case O(n^2)

我认为您可以将复杂性降低到 O(nlogn)。最坏情况 O(n^2)

pseudo code.

伪代码。

larger array equals a
other array equals b

sort a

iterate through b
       binary search b at iterated index
     // I would throw (last index - index) logic in binary search
     // to exit out of that even faster by returning "NOT FOUND" as soon as that is hit.
       if found && (last index - index) is less than or equal 
          store last index
          print value

this is the fastest way possible to do your problem i believe.

我相信这是解决您的问题的最快方法。

回答by Luke

Your implementation is roughly O(A.length*2k).

你的实现大约是 O(A.length*2k)。

That seems to be about the best you're going to do if you want to maintain your "no more than k away" logic, as that rules out sorting and the use of sets. I would alter a little to make your code more understandable.

如果您想保持“不超过 k 距离”的逻辑,这似乎是您要做的最好的事情,因为这排除了排序和使用集合。我会稍微改动一下,使您的代码更易于理解。

  1. First, I would ensure that you iterate over the smaller of the two arrays. This would make the complexity O(min(A.length, B.length)*2k).

    To understand the purpose of this, consider the case where Ahas 1 element and Bhas 100. In this case, we are only going to perform one iteration in the outer loop, and k iterations in the inner loop.

    Now consider when Ahas 100 elements, and Bhas 1. In this case, we will perform 100 iterations on the outer loop, and 1 iteration each on the inner loop.

    If k is less than the length of your long array, iterating over the shorter array in the outer loop will be more efficient.

  2. Then, I would change how you're calculating the k distance stuff just for readability's sake. The code I've written demonstrates this.

  1. 首先,我会确保您遍历两个数组中较小的一个。这将使复杂度为 O(min(A.length, B.length)*2k)。

    要理解这样做的目的,请考虑A有 1 个元素和B100个元素的情况。在这种情况下,我们将只在外循环中执行 1 次迭代,在内循环中执行 k 次迭代。

    现在考虑何时A有 100 个元素,并且B有 1 个。在这种情况下,我们将在外循环上执行 100 次迭代,在内循环上每次执行 1 次迭代。

    如果 k 小于长数组的长度,则在外循环中迭代较短的数组会更有效。

  2. 然后,为了可读性,我会改变你计算 k 距离的方式。我写的代码证明了这一点。

Here's what I would do:

这是我会做的:

//not sure what type of array we're dealing with here, so I'll assume int.
int[] toIterate;
int[] toSearch;

if (A.length > B.length)
{
    toIterate = B;
    toSearch = A;
}
else
{
    toIterate = A;
    toSearch = B;
}

for (int i = 0; i < toIterate.length; i++)
{
    // set j to k away in the negative direction
    int j = i - k;

    if (j < 0) 
        j = 0;

    // only iterate until j is k past i
    for (; (j < toSearch.length) && (j <= i + k); j++)
    {
        if(toIterate[i] == toSearch[j])
        {
            System.out.println(toSearch[j]);
        }
    }
}

Your use of jlimitand ksmay work, but handling your k distance like this is more understandable for your average programmer (and it's marginally more efficient).

您使用jlimitks可能会起作用,但是对于普通程序员来说,像这样处理 k 距离更容易理解(而且效率更高)。

回答by Prashant Bhate

Generic solution

通用解决方案

public static void main(String[] args) {
    String[] a = { "a", "b" };
    String[] b = { "c", "b" };
    String[] intersection = intersection(a, b, a[0].getClass());
    System.out.println(Arrays.toString(intersection));
    Integer[] aa = { 1, 3, 4, 2 };
    Integer[] bb = { 1, 19, 4, 5 };
    Integer[] intersectionaabb = intersection(aa, bb, aa[0].getClass());
    System.out.println(Arrays.toString(intersectionaabb));
}

@SuppressWarnings("unchecked")
private static <T> T[] intersection(T[] a, T[] b, Class<? extends T> c) {
    HashSet<T> s = new HashSet<>(Arrays.asList(a));
    s.retainAll(Arrays.asList(b));
    return s.toArray((T[]) Array.newInstance(c, s.size()));
}

Output

输出

[b]
[1, 4]

回答by le-doude

O(N) solution (BloomFilters):

O(N) 解决方案(布隆过滤器):

Here is a solution using bloom filters (implementation is from the Guava library)

这是使用布隆过滤器的解决方案(实现来自番石榴库)

public static <T> T findCommon_BloomFilterImpl(T[] A, T[] B, Funnel<T> funnel) {
    BloomFilter<T> filter = BloomFilter.create(funnel, A.length + B.length);
    for (T t : A) {
        filter.put(t);
    }
    for (T t : B) {
        if (filter.mightContain(t)) {
            return t;
        }
    }
    return null;
}

use it like this:

像这样使用它:

    Integer j = Masking.findCommon_BloomFilterImpl(new Integer[]{12, 2, 3, 4, 5222, 622, 71, 81, 91, 10}, new Integer[]{11, 100, 15, 18, 79, 10}, Funnels.integerFunnel());
    Assert.assertNotNull(j);
    Assert.assertEquals(10, j.intValue());

Runs in O(N) since calculating hash for Integer is pretty straight forward. So still O(N) if you can reduce the calculation of hash of your elementents to O(1) or a small O(K) where K is the size of each element.

以 O(N) 运行,因为计算 Integer 的哈希非常简单。所以仍然是 O(N),如果你可以将元素的哈希计算减少到 O(1) 或一个小的 O(K),其中 K 是每个元素的大小。

O(N.LogN) solution (sorting and iterating):

O(N.LogN) 解决方案(排序和迭代):

Sorting and the iterating through the array will lead you to a O(N*log(N)) solution:

排序和遍历数组将导致 O(N*log(N)) 解决方案:

public static <T extends Comparable<T>> T findCommon(T[] A, T[] B, Class<T> clazz) {
    T[] array = concatArrays(A, B, clazz);
    Arrays.sort(array);
    for (int i = 1; i < array.length; i++) {
        if (array[i - 1].equals(array[i])) {     //put your own equality check here
            return array[i];
        }
    }
    return null;
}

concatArrays(~)is in O(N) of course. Arrays.sort(~)is a bi-pivot implementation of QuickSort with complexity in O(N.logN), and iterating through the array again is O(N).

concatArrays(~)当然是在 O(N) 中。Arrays.sort(~)是 QuickSort 的双轴实现,复杂度为 O(N.logN),再次遍历数组是 O(N)。

So we have O((N+2).logN) ~> O(N.logN).

所以我们有 O((N+2).logN) ~> O(N.logN)。

As a general case solution (withouth the "within k" condition of your problem) is better than yours. It should be considered for k "close to" N in your precise case.

作为一般情况下的解决方案(没有您的问题的“k 内”条件)比您的更好。在您的确切情况下,应该考虑 k “接近” N 。

回答by Pani Dhakshnamurthy

Simple solution if arrays are already sorted

如果数组已经排序的简单解决方案

 public static void get_common_courses(Integer[] courses1, Integer[] courses2) {
        // Sort both arrays if input is not sorted 
        //Arrays.sort(courses1);
        //Arrays.sort(courses2);
        int i=0, j=0;
        while(i<courses1.length && j<courses2.length) {
            if(courses1[i] > courses2[j]) {
                j++;
            } else if(courses1[i] < courses2[j]){
                i++;
            } else {
                System.out.println(courses1[i]);
                i++;j++;
            }
        }
}

Apache commons collections API has done this in efficient way without sorting

Apache commons collections API 以高效的方式做到了这一点,无需排序

    public static Collection intersection(final Collection a, final Collection b) {
    ArrayList list = new ArrayList();
    Map mapa = getCardinalityMap(a);
    Map mapb = getCardinalityMap(b);
    Set elts = new HashSet(a);
    elts.addAll(b);
    Iterator it = elts.iterator();
    while(it.hasNext()) {
        Object obj = it.next();
        for(int i=0,m=Math.min(getFreq(obj,mapa),getFreq(obj,mapb));i<m;i++) {
            list.add(obj);
        }
    }
    return list;
}

回答by Jakub Rozenbajger

Solution using Java 8

使用 Java 8 的解决方案

static <T> Collection<T> intersection(Collection<T> c1, Collection<T> c2) {
    if (c1.size() < c2.size())
        return intersection(c2, c1);
    Set<T> c2set = new HashSet<>(c2);
    return c1.stream().filter(c2set::contains).distinct().collect(Collectors.toSet());
}

Use Arrays::asList and boxed values of primitives:

使用 Arrays::asList 和原语的装箱值:

Integer[] a =...    
Collection<Integer> res = intersection(Arrays.asList(a),Arrays.asList(b));