如何最好地比较 Java 中的两个集合并对其采取行动?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23445/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 07:15:27  来源:igfitidea点击:

How Best to Compare Two Collections in Java and Act on Them?

javacollections

提问by ckpwong

I have two collections of the same object, Collection<Foo> oldSetand Collection<Foo> newSet. The required logic is as follow:

我有两个相同对象的集合,Collection<Foo> oldSet并且Collection<Foo> newSet. 所需的逻辑如下:

  • if foois in(*) oldSetbut not newSet, call doRemove(foo)
  • else if foois not in oldSetbut in newSet, call doAdd(foo)
  • else if foois in both collections but modified, call doUpdate(oldFoo, newFoo)
  • else if !foo.activated && foo.startDate >= now, call doStart(foo)
  • else if foo.activated && foo.endDate <= now, call doEnd(foo)
  • 如果foo在(*)oldSet但不是newSet,调用doRemove(foo)
  • else if foois not in oldSetbut in newSet,调用doAdd(foo)
  • 否则如果foo在两个集合中但已修改,则调用doUpdate(oldFoo, newFoo)
  • 否则,如果!foo.activated && foo.startDate >= now,调用doStart(foo)
  • 否则,如果foo.activated && foo.endDate <= now,调用doEnd(foo)

(*) "in" means the unique identifier matches, not necessarily the content.

(*) “in”表示唯一标识符匹配,不一定是内容。

The current (legacy) code does many comparisons to figure out removeSet, addSet, updateSet, startSetand endSet, and then loop to act on each item.

目前的(传统)的代码做了很多比较,以计算出removeSetaddSetupdateSetstartSetendSet,然后循环,在每个项目采取行动。

The code is quite messy (partly because I have left out some spaghetti logic already) and I am trying to refactor it. Some more background info:

代码很乱(部分是因为我已经遗漏了一些意大利面条式的逻辑),我正在尝试重构它。更多背景信息:

  • As far as I know, the oldSetand newSetare actually backed by ArrayList
  • Each set contains less than 100 items, most likely max out at 20
  • This code is called frequently (measured in millions/day), although the sets seldom differ
  • 据我所知,oldSetnewSet实际上是由ArrayList
  • 每套包含少于 100 个项目,最有可能最多 20 个
  • 此代码被频繁调用(以百万/天为单位),尽管集合很少不同

My questions:

我的问题:

  • If I convert oldSetand newSetinto HashMap<Foo>(order is not of concern here), with the IDs as keys, would it made the code easier to read and easier to compare? How much of time & memory performance is loss on the conversion?
  • Would iterating the two sets and perform the appropriate operation be more efficient and concise?
  • 如果我转换oldSetnewSetHashMap<Foo>(顺序是这里关注的不是),与ID作为键,将它使代码更易于阅读和更容易比较?转换过程中损失了多少时间和内存性能?
  • 迭代两组并执行适当的操作会更高效简洁吗?

回答by Mike Deck

For a set that small is generally not worth it to convert from an Array to a HashMap/set. In fact, you're probably best off keeping them in an array and then sorting them by key and iterating over both lists simultaneously to do the comparison.

对于这么小的集合,通常不值得从 Array 转换为 HashMap/set。事实上,您可能最好将它们保存在一个数组中,然后按键对它们进行排序并同时迭代两个列表以进行比较。

回答by martinatime

I have created an approximation of what I think you are looking for just using the Collections Framework in Java. Frankly, I think it is probably overkill as @Mike Deck points out. For such a small set of items to compare and process I think arrays would be a better choice from a procedural standpoint but here is my pseudo-coded (because I'm lazy) solution. I have an assumption that the Foo class is comparable based on it's unique id and not all of the data in it's contents:

我已经创建了我认为您正在寻找的近似值,只需使用 Java 中的集合框架即可。坦率地说,正如@Mike Deck 所指出的那样,我认为这可能有点矫枉过正。对于这么小的一组要比较和处理的项目,我认为从程序的角度来看,数组将是更好的选择,但这是我的伪编码(因为我很懒)解决方案。我有一个假设,即 Foo 类是基于它的唯一 id 而不是其内容中的所有数据的可比性:

Collection<Foo> oldSet = ...;
Collection<Foo> newSet = ...;

private Collection difference(Collection a, Collection b) {
    Collection result = a.clone();
    result.removeAll(b)
    return result;
}

private Collection intersection(Collection a, Collection b) {
    Collection result = a.clone();
    result.retainAll(b)
    return result;
}

public doWork() {
    // if foo is in(*) oldSet but not newSet, call doRemove(foo)
    Collection removed = difference(oldSet, newSet);
    if (!removed.isEmpty()) {
        loop removed {
            Foo foo = removedIter.next();
            doRemove(foo);
        }
    }
    //else if foo is not in oldSet but in newSet, call doAdd(foo)
    Collection added = difference(newSet, oldSet);
    if (!added.isEmpty()) {
        loop added  {
            Foo foo = addedIter.next();
            doAdd(foo);
        }
    }

    // else if foo is in both collections but modified, call doUpdate(oldFoo, newFoo)
    Collection matched = intersection(oldSet, newSet);
    Comparator comp = new Comparator() {
        int compare(Object o1, Object o2) {
            Foo f1, f2;
            if (o1 instanceof Foo) f1 = (Foo)o1;
            if (o2 instanceof Foo) f2 = (Foo)o2;
            return f1.activated == f2.activated ? f1.startdate.compareTo(f2.startdate) == 0 ? ... : f1.startdate.compareTo(f2.startdate) : f1.activated ? 1 : 0;
        }

        boolean equals(Object o) {
             // equal to this Comparator..not used
        }
    }
    loop matched {
        Foo foo = matchedIter.next();
        Foo oldFoo = oldSet.get(foo);
        Foo newFoo = newSet.get(foo);
        if (comp.compareTo(oldFoo, newFoo ) != 0) {
            doUpdate(oldFoo, newFoo);
        } else {
            //else if !foo.activated && foo.startDate >= now, call doStart(foo)
            if (!foo.activated && foo.startDate >= now) doStart(foo);

            // else if foo.activated && foo.endDate <= now, call doEnd(foo)
            if (foo.activated && foo.endDate <= now) doEnd(foo);
        }
    }
}

As far as your questions: If I convert oldSet and newSet into HashMap (order is not of concern here), with the IDs as keys, would it made the code easier to read and easier to compare? How much of time & memory performance is loss on the conversion? I think that you would probably make the code more readable by using a Map BUT...you would probably use more memory and time during the conversion.

至于您的问题:如果我将 oldSet 和 newSet 转换为 HashMap(这里不关心顺序),以 ID 为键,是否会使代码更易于阅读和比较?转换过程中损失了多少时间和内存性能?我认为您可能会通过使用 Map 使代码更具可读性,但是在转换过程中您可能会使用更多的内存和时间。

Would iterating the two sets and perform the appropriate operation be more efficient and concise? Yes, this would be the best of both worlds especially if you followed @Mike Sharek 's advice of Rolling your own List with the specialized methods or following something like the Visitor Design pattern to run through your collection and process each item.

迭代两组并执行适当的操作会更高效简洁吗?是的,这将是两全其美的,尤其是如果您遵循 @Mike Sharek 的建议,即使用专门的方法滚动您自己的列表或遵循访问者设计模式之类的东西来运行您的集合并处理每个项目。

回答by Bartosz Bierkowski

I'd move to lists and solve it this way:

我会转向列表并以这种方式解决它:

  1. Sort both lists by id ascending using custom Comparatorif objects in lists aren't Comparable
  2. Iterate over elements in both lists like in merge phase in merge sort algorithm, but instead of merging lists, you check your logic.
  1. 排序使用自定义的ID升两个列表比较,如果在列表中的对象都没有可比性
  2. 像在合并排序算法中的合并阶段一样迭代两个列表中的元素,但不是合并列表,而是检查逻辑。

The code would be more or less like this:

代码或多或少是这样的:

/* Main method */
private void execute(Collection<Foo> oldSet, Collection<Foo> newSet) {
  List<Foo> oldList = asSortedList(oldSet);
  List<Foo> newList = asSortedList(newSet);

  int oldIndex = 0;
  int newIndex = 0;
  // Iterate over both collections but not always in the same pace
  while( oldIndex < oldList.size() 
      && newIndex < newIndex.size())  {
    Foo oldObject = oldList.get(oldIndex);
    Foo newObject = newList.get(newIndex);

    // Your logic here
    if(oldObject.getId() < newObject.getId()) {
      doRemove(oldObject);
      oldIndex++;
    } else if( oldObject.getId() > newObject.getId() ) {
      doAdd(newObject);
      newIndex++;
    } else if( oldObject.getId() == newObject.getId() 
            && isModified(oldObject, newObject) ) {
      doUpdate(oldObject, newObject);
      oldIndex++;
      newIndex++;
    } else {
      ... 
    }
  }// while

  // Check if there are any objects left in *oldList* or *newList*

  for(; oldIndex < oldList.size(); oldIndex++ ) {
    doRemove( oldList.get(oldIndex) );  
  }// for( oldIndex )

  for(; newIndex < newList.size(); newIndex++ ) {
    doAdd( newList.get(newIndex) );
  }// for( newIndex ) 
}// execute( oldSet, newSet )

/** Create sorted list from collection 
    If you actually perform any actions on input collections than you should 
    always return new instance of list to keep algorithm simple.
*/
private List<Foo> asSortedList(Collection<Foo> data) {
  List<Foo> resultList;
  if(data instanceof List) {
     resultList = (List<Foo>)data;
  } else {
     resultList = new ArrayList<Foo>(data);
  }
  Collections.sort(resultList)
  return resultList;
}

回答by user143081

Apache's commons.collections library has a CollectionUtils class that provides easy-to-use methods for Collection manipulation/checking, such as intersection, difference, and union.

Apache 的 commons.collections 库有一个 CollectionUtils 类,该类提供了易于使用的集合操作/检查方法,例如交集、差异和并集。

The org.apache.commons.collections.CollectionUtils API docs are here.

org.apache.commons.collections.CollectionUtils API 文档在这里

回答by Sharan Rajendran

I think the easiest way to do that is by using apache collections api - CollectionUtils.subtract(list1,list2) as long the lists are of the same type.

我认为最简单的方法是使用 apache collections api - CollectionUtils.subtract(list1,list2) 只要​​列表的类型相同。

回答by Lijo Mathew

For comaparing a list or set we can use Arrays.equals(object[], object[]). It will check for the values only. To get the Object[]we can use Collection.toArray()method.

为了比较列表或集合,我们可以使用Arrays.equals(object[], object[]). 它将仅检查值。为了得到Object[]我们可以使用的Collection.toArray()方法。

回答by Vitalii Fedorenko

You can use Java 8 streams, for example

例如,您可以使用 Java 8 流

set1.stream().filter(s -> set2.contains(s)).collect(Collectors.toSet());

or Setsclass from Guava:

或 从Guava设置类:

Set<String> intersection = Sets.intersection(set1, set2);
Set<String> difference = Sets.difference(set1, set2);
Set<String> symmetricDifference = Sets.symmetricDifference(set1, set2);
Set<String> union = Sets.union(set1, set2);

回答by pooja

public static boolean doCollectionsContainSameElements(
        Collection<Integer> c1, Collection<Integer> c2){

    if (c1 == null || c2 == null) {
        return false;
    }
    else if (c1.size() != c2.size()) {
        return false;
    } else {    
        return c1.containsAll(c2) && c2.containsAll(c1);
    }       
}