在 Java 中比较两个集合的最快方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3341202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 22:27:07  来源:igfitidea点击:

What is the fastest way to compare two sets in Java?

javaperformanceset

提问by Shekhar

I am trying to optimize a piece of code which compares elements of list.

我正在尝试优化一段比较列表元素的代码。

Eg.

例如。

public void compare(Set<Record> firstSet, Set<Record> secondSet){
    for(Record firstRecord : firstSet){
        for(Record secondRecord : secondSet){
            // comparing logic
        }
    }
}

Please take into account that the number of records in sets will be high.

请注意集合中的记录数会很高。

Thanks

谢谢

Shekhar

谢哈尔

回答by Noel M

firstSet.equals(secondSet)

It really depends on what you want to do in the comparison logic... ie what happens if you find an element in one set not in the other? Your method has a voidreturn type so I assume you'll do the necessary work in this method.

这真的取决于你想在比较逻辑中做什么......即如果你在一个集合中而不是另一个集合中找到一个元素会发生什么?你的方法有一个void返回类型,所以我假设你会在这个方法中做必要的工作。

More fine-grained control if you need it:

如果需要,可以进行更细粒度的控制:

if (!firstSet.containsAll(secondSet)) {
  // do something if needs be
}
if (!secondSet.containsAll(firstSet)) {
  // do something if needs be
}

If you need to get the elements that are in one set and not the other.
EDIT: set.removeAll(otherSet)returns a boolean, not a set. To use removeAll(), you'll have to copy the set then use it.

如果您需要获取一组中的元素而不是另一组中的元素。
编辑:set.removeAll(otherSet)返回一个布尔值,而不是一个集合。要使用 removeAll(),您必须复制该集合然后使用它。

Set one = new HashSet<>(firstSet);
Set two = new HashSet<>(secondSet);
one.removeAll(secondSet);
two.removeAll(firstSet);

If the contents of oneand twoare both empty, then you know that the two sets were equal. If not, then you've got the elements that made the sets unequal.

如果内容onetwo都是空的,那么你知道这两组都是平等的。如果不是,那么您就有了使集合不相等的元素。

You mentioned that the number of records might be high. If the underlying implementation is a HashSetthen the fetching of each record is done in O(1)time, so you can't really get much better than that. TreeSetis O(log n).

您提到记录数可能很高。如果底层实现是一个,HashSet那么每条记录的获取都会O(1)及时完成,所以你真的没有比这更好的了。TreeSetO(log n)

回答by Stephen C

If you simply want to know if the sets are equal, the equalsmethod on AbstractSetis implemented roughly as below:

如果你只是想知道集合是否相等,equals方法 onAbstractSet大致实现如下:

    public boolean equals(Object o) {
        if (o == this)
            return true;
        if (!(o instanceof Set))
            return false;
        Collection c = (Collection) o;
        if (c.size() != size())
            return false;
        return containsAll(c);
    }

Note how it optimizes the common cases where:

请注意它如何优化以下常见情况:

  • the two objects are the same
  • the other object is not a set at all, and
  • the two sets' sizes are different.
  • 这两个对象是相同的
  • 另一个对象根本不是一个集合,并且
  • 两组的尺寸不同。

After that, containsAll(...)will return falseas soon as it finds an element in the other set that is not also in this set. But if all elements are present in both sets, it will need to test all of them.

之后,只要它在另一个集合中找到不在这个集合中的元素,containsAll(...)就会立即返回false。但是如果两个集合中都存在所有元素,则需要测试所有元素。

The worst case performance therefore occurs when the two sets are equal but not the same objects. That cost is typically O(N)or O(NlogN)depending on the implementation of this.containsAll(c).

因此,当两个集合相等但不是相同的对象时,会出现最坏的情况。该成本通常O(N)O(NlogN)取决于this.containsAll(c).

And you get close-to-worst case performance if the sets are large and only differ in a tiny percentage of the elements.

如果集合很大并且只有很小比例的元素不同,那么您将获得接近最坏情况的性能。



UPDATE

更新

If you are willing to invest time in a custom set implementation, there is an approach that can improve the "almost the same" case.

如果您愿意在自定义集实现上投入时间,则有一种方法可以改善“几乎相同”的情况。

The idea is that you need to pre-calculate and cache a hash for the entire set so that you could get the set's current hashcode value in O(1). Then you can compare the hashcode for the two sets as an acceleration.

这个想法是您需要预先计算并缓存整个集合的哈希值,以便您可以在O(1). 然后你可以比较两组的哈希码作为加速度。

How could you implement a hashcode like that? Well if the set hashcode was:

你怎么能实现这样的哈希码?好吧,如果设置的哈希码是:

  • zero for an empty set, and
  • the XOR of all of the element hashcodes for a non-empty set,
  • 空集为零,并且
  • 非空集合的所有元素哈希码的异或,

then you could cheaply update the set's cached hashcode each time you added or removed an element. In both cases, you simply XOR the element's hashcode with the current set hashcode.

那么您可以在每次添加或删除元素时廉价地更新集合的缓存哈希码。在这两种情况下,您只需将元素的哈希码与当前设置的哈希码进行 XOR。

Of course, this assumes that element hashcodes are stable while the elements are members of sets. It also assumes that the element classes hashcode function gives a good spread. That is because when the two set hashcodes are the same you still have to fall back to the O(N)comparison of all elements.

当然,这假设元素哈希码是稳定的,而元素是集合的成员。它还假设元素类哈希码函数提供了良好的传播。那是因为当两个集合哈希码相同时,您仍然必须回退到O(N)所有元素的比较。



You could take this idea a bit further ... at least in theory.

你可以更进一步地理解这个想法……至少在理论上是这样。

WARNING- This is highly speculative. A "thought experiment" if you like.

警告- 这是高度推测性的。如果您愿意,可以进行“思想实验”。

Suppose that your set element class has a method to return a crypto checksums for the element. Now implement the set's checksums by XORing the checksums returned for the elements.

假设您的 set 元素类有一个方法来返回该元素的加密校验和。现在通过对元素返回的校验和进行异或来实现集合的校验和。

What does this buy us?

这给我们买了什么?

Well, if we assume that nothing underhand is going on, the probability that any two unequal set elements have the same N-bit checksums is 2-N. And the probability 2 unequal sets have the same N-bit checksums is also 2-N. So my idea is that you can implement equalsas:

好吧,如果我们假设没有任何秘密发生,那么任何两个不相等的集合元素具有相同的 N 位校验和的概率是 2 -N。并且 2 个不相等的集合具有相同的 N 位校验和的概率也是 2 -N。所以我的想法是你可以实现equals为:

    public boolean equals(Object o) {
        if (o == this)
            return true;
        if (!(o instanceof Set))
            return false;
        Collection c = (Collection) o;
        if (c.size() != size())
            return false;
        return checksums.equals(c.checksums);
    }

Under the assumptions above, this will only give you the wrong answer once in 2-Ntime. If you make N large enough (e.g. 512 bits) the probability of a wrong answer becomes negligible (e.g. roughly 10-150).

在上述假设下,这只会在 2 -N次中给您一次错误的答案。如果您使 N 足够大(例如 512 位),则错误答案的概率可以忽略不计(例如大约 10 -150)。

The downside is that computing the crypto checksums for elements is very expensive, especially as the number of bits increases. So you really need an effective mechanism for memoizing the checksums. And that could be problematic.

缺点是计算元素的加密校验和非常昂贵,尤其是随着位数的增加。所以你真的需要一个有效的机制来记忆校验和。这可能是有问题的。

And the other downside is that a non-zero probability of error may beunacceptable no matter how small the probability is. (But if that is the case ... how do you deal with the case where a cosmic ray flips a critical bit? Or if it simultaneously flips the same bit in two instances of a redundant system?)

另一个缺点是,无论概率有多小,非零概率都可能是不可接受的。(但如果是这样的话……你如何处理宇宙射线翻转关键位的情况?或者如果它在冗余系统的两个实例中同时翻转同一位?)

回答by Zahran

public boolean equals(Object o) {
        if (o == this)
            return true;
        if (!(o instanceof Set))
            return false;

        Set<String> a = this;
        Set<String> b = o;
        Set<String> thedifference_a_b = new HashSet<String>(a);


        thedifference_a_b.removeAll(b);
        if(thedifference_a_b.isEmpty() == false) return false;

        Set<String> thedifference_b_a = new HashSet<String>(b);
        thedifference_b_a.removeAll(a);

        if(thedifference_b_a.isEmpty() == false) return false;

        return true;
    }

回答by husayt

There is a method in Guava Setswhich can help here:

Guava 中有一种方法Sets可以在这里提供帮助:

public static <E>  boolean equals(Set<? extends E> set1, Set<? extends E> set2){
return Sets.symmetricDifference(set1,set2).isEmpty();
}

回答by Philip Couling

There's an O(N) solution for very specific cases where:

对于非常特殊的情况,有一个 O(N) 解决方案,其中:

  • the sets are both sorted
  • both sorted in the same order
  • 集合都已排序
  • 都以相同的顺序排序

The following code assumes that both sets are based on the records comparable. A similar method could be based on on a Comparator.

以下代码假定两个集合都基于可比较的记录。类似的方法可以基于比较器。

    public class SortedSetComparitor <Foo extends Comparable<Foo>> 
            implements Comparator<SortedSet<Foo>> {

        @Override
        public int compare( SortedSet<Foo> arg0, SortedSet<Foo> arg1 ) {
            Iterator<Foo> otherRecords = arg1.iterator();
            for (Foo thisRecord : arg0) {
                // Shorter sets sort first.
                if (!otherRecords.hasNext()) return 1;
                int comparison = thisRecord.compareTo(otherRecords.next());
                if (comparison != 0) return comparison;
            }
            // Shorter sets sort first
            if (otherRecords.hasNext()) return -1;
            else return 0;
        }
    }

回答by Sahin Habesoglu

I would put the secondSet in a HashMap before the comparison. This way you will reduce the second list's search time to n(1). Like this:

在比较之前,我会将 secondSet 放在 HashMap 中。这样您就可以将第二个列表的搜索时间减少到 n(1)。像这样:

HashMap<Integer,Record> hm = new HashMap<Integer,Record>(secondSet.size());
int i = 0;
for(Record secondRecord : secondSet){
    hm.put(i,secondRecord);
    i++;
}
for(Record firstRecord : firstSet){
    for(int i=0; i<secondSet.size(); i++){
    //use hm for comparison
    }
}

回答by riwnodennyk

If you are using Guavalibrary it's possible to do:

如果您正在使用Guava库,则可以执行以下操作:

        SetView<Record> added = Sets.difference(secondSet, firstSet);
        SetView<Record> removed = Sets.difference(firstSet, secondSet);

And then make a conclusion based on these.

然后根据这些得出结论。

回答by snr

I think method reference with equals method can be used. We assume that the object type without a shadow of a doubt has its own comparison method. Plain and simple example is here,

我认为可以使用带有 equals 方法的方法引用。我们假设毫无疑问的对象类型有自己的比较方法。简单明了的例子在这里,

Set<String> set = new HashSet<>();
set.addAll(Arrays.asList("leo","bale","hanks"));

Set<String> set2 = new HashSet<>();
set2.addAll(Arrays.asList("hanks","leo","bale"));

Predicate<Set> pred = set::equals;
boolean result = pred.test(set2);
System.out.println(result);   // true

回答by ilopezluna

You have the following solution from https://www.mkyong.com/java/java-how-to-compare-two-sets/

您从https://www.mkyong.com/java/java-how-to-compare-two-sets/获得以下解决方案

public static boolean equals(Set<?> set1, Set<?> set2){

    if(set1 == null || set2 ==null){
        return false;
    }

    if(set1.size() != set2.size()){
        return false;
    }

    return set1.containsAll(set2);
}

Or if you prefer to use a single return statement:

或者,如果您更喜欢使用单个 return 语句:

public static boolean equals(Set<?> set1, Set<?> set2){

  return set1 != null 
    && set2 != null 
    && set1.size() == set2.size() 
    && set1.containsAll(set2);
}