从 Java 7 和 8 中的现有列表创建不同的列表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27464781/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 04:25:09  来源:igfitidea点击:

Creating distinct list from existing list in Java 7 and 8?

javalistjava-8java-7distinct-values

提问by Matt Coubrough

If I have:

如果我有:

List<Integer> listInts = { 1, 1, 3, 77, 2, 19, 77, 123, 14, 123... }

in Java what is an efficient way of creating a List<Integer> listDistinctIntscontaining only the distinctvalues from listInts?

在 Java 中,创建List<Integer> listDistinctInts仅包含来自的不同值的有效方法是listInts什么?

My immediate thought is to create a Set<Integer> setIntscontaining all the values from listIntsthen call List<Integer> listDistinctInts = new ArrayList<>(setInts);

我的直接想法是创建一个Set<Integer> setInts包含所有值的listInts然后调用List<Integer> listDistinctInts = new ArrayList<>(setInts);

But this seems potentially inefficient - is there a better solution using Java 7?

但这似乎可能效率低下 - 是否有使用 Java 7 的更好解决方案?

I'm not using Java 8, but I believe using it I could do something like this(?):

我没有使用 Java 8,但我相信使用它我可以做这样的事情(?):

List<Integer> listDistinctInts = listInts.stream().distinct().collect(Collectors.toList());

Would this be more performant than the approach above and/or is there any more efficient way of doing this in Java 8?

这会比上面的方法性能更高和/或在 Java 8 中有更有效的方法吗?

Finally, (and I'm aware that asking multiple questions might be frowned upon but it's directly related) if I only cared about the countof distinct elements in listIntsis there a more efficient way to get that value (in Java 7 and 8) - without first creating a list or set of all the distinct elements?

最后,(我知道问多个问题可能会被人反对,但它是直接相关的)如果我只关心不同元素的数量listInts是否有更有效的方法来获得该值(在 Java 7 和 8 中) -不首先创建所有不同元素的列表或集合?

I'm most interested in native Java ways of accomplishing this and avoiding re-inventing any wheels but would consider hand-rolled code or libraries if they offer better clarity or performance. I've read this related question Java - Distinct List of Objectsbut it's not entirely clear about the differences in performance between the Java 7 and 8 approaches or whether there might be better techniques?

我最感兴趣的是原生 Java 方法来实现这一点并避免重新发明任何轮子,但如果它们提供更好的清晰度或性能,我会考虑手动滚动代码或库。我已经阅读了这个相关的问题Java - Distinct List of Objects,但并不完全清楚 Java 7 和 8 方法之间的性能差异,或者是否有更好的技术?

回答by Victor2748

When adding a value to a listIntscheck:

listInts支票添加值时:

int valueToAdd;
//...
if (!listInts.contains(valueToAdd)) {listInts.add(valueToAdd)}

if you have an existing list, use a for-each statement to copy all the values from that list, to a new one, that you want to be "distinct":

如果您有一个现有列表,请使用 for-each 语句将该列表中的所有值复制到您想要“不同”的新列表中:

List<Integer> listWithRepeatedValues;
List<Integer> distinctList;
//...
for (Integer i : listWithRepeatedValues) {
    if (!listInts.contains(valueToAdd)) {distinctList.add(i);}
}

回答by Everv0id

You should try new LinkedList(new HashSet(listInts)).

你应该试试new LinkedList(new HashSet(listInts))

回答by ulix

Don't worry. Using a HashSet is a pretty easy and efficient way to eliminate duplicates:

别担心。使用 HashSet 是一种非常简单有效的消除重复的方法:

    Set<Integer> uniqueList = new HashSet<>();
    uniqueList.addAll(listInts);   // Add all elements eliminating duplicates

    for (int n : uniqueList)       // Check the results (in no particular order)
        System.out.println(n);

    System.out.println("Number distinct values: " + uniqueList.size());

In a more specific scenario, just in case the range of possible values is known, is not very large, while listIntsis very large.
The most efficient way of counting the number of unique entries in the list that I can think of is:

在更具体的场景中,以防万一可能值的范围已知,不是很大,而listInts很大。
我能想到的计算列表中唯一条目数量的最有效方法是:

    boolean[] counterTable = new boolean[124];
    int counter = 0;

    for (int n : listInts)
        if (!counterTable[n]) {
            counter++;
            counterTable[n] = true;
        }

    System.out.println("Number of distinct values: " + counter);

回答by u290629

Guavacan be your choice:

番石榴可以是您的选择:

ImmutableSet<Integer> set = ImmutableSet.copyOf(listInts);

The API is extremely optimized.

API 是极其优化的。

It is FASTER than listInts.stream().distinct()and new LinkedHashSet<>(listInts).

它比listInts.stream().distinct()和快new LinkedHashSet<>(listInts)

回答by Craig P. Motlin

If you're using Eclipse Collections(formerly GS Collections), you can use the method distinct().

如果您使用Eclipse Collections(以前称为GS Collections),则可以使用 方法distinct()

ListIterable<Integer> listInts = FastList.newListWith(1, 1, 3, 77, 2, 19, 77, 123, 14, 123);
Assert.assertEquals(
        FastList.newListWith(1, 3, 77, 2, 19, 123, 14),
        listInts.distinct());

The advantage of using distinct()instead of converting to a Set and then back to a List is that distinct()preserves the order of the original List, retaining the first occurrence of each element. It's implemented by using both a Set and a List.

使用distinct()而不是转换为 Set 然后再转换回 List的优点是distinct()保留原始 List 的顺序,保留每个元素的第一次出现。它是通过使用 Set 和 List 来实现的。

MutableSet<T> seenSoFar = UnifiedSet.newSet();
int size = list.size();
for (int i = 0; i < size; i++)
{
    T item = list.get(i);
    if (seenSoFar.add(item))
    {
        targetCollection.add(item);
    }
}
return targetCollection;

If you cannot convert your original List into a GS Collections type, you can use ListAdapter to get the same API.

如果您无法将您的原始 List 转换为 GS Collections 类型,您可以使用 ListAdapter 来获取相同的 API。

MutableList<Integer> distinct = ListAdapter.adapt(integers).distinct();

There's no way to avoid the creation of the Set. Still, UnifiedSet is more efficient than HashSet so there will be some speed benefit.

没有办法避免创建 Set。尽管如此,UnifiedSet 比 HashSet 更有效,因此会有一些速度优势。

If all you want is the numberof distinct items, it's more efficient to just create a set without creating the list.

如果您想要的只是不同项目的数量,那么只创建一个集合而不创建列表会更有效。

Verify.assertSize(7, UnifiedSet.newSet(listInts));

Eclipse Collections 8.0 requires Java 8. Eclipse Collections 7.x works well with Java 8, but only requires Java 5.

Eclipse Collections 8.0 需要 Java 8。Eclipse Collections 7.x 可以很好地与 Java 8 配合使用,但只需要 Java 5。

Note:I am a committer for Eclipse collections.

注意:我是 Eclipse 集合的提交者。

回答by Matt Coubrough

I've now MicroBenchmarked most of the proposed options from the excellent answers provided. Like most non-trivial performance related questions the answer as to which is best is "it depends".

我现在已经从提供的优秀答案中对大多数建议的选项进行了微基准测试。像大多数与性能相关的重要问题一样,关于哪个最好的答案是“视情况而定”

All my testing was performed with JMHJava Microbenchmarking Harness.

我所有的测试都是使用JMHJava Microbenchmarking Harness 进行的

Most of these tests were performed using JDK 1.8, although I performed some of the tests with JDK 1.7 too just to ensure that its performance wasn't too different (it was almost identical). I tested the following techniques taken from the answers supplied so far:

大多数这些测试是使用 JDK 1.8 执行的,尽管我也使用 JDK 1.7 执行了一些测试,只是为了确保其性能没有太大差异(几乎相同)。我从目前提供的答案中测试了以下技术:



1. Java 8 Stream- The solution using stream()I had proprosed as a possibility if using Java8:

1. Java 8 Stream-stream()如果使用 Java8,我建议使用的解决方案是:

public List<Integer> testJava8Stream(List<Integer> listInts) {
    return listInts.stream().distinct().collect(Collectors.toList());
}

prosmodern Java 8 approach, no 3rd party dependencies

现代 Java 8 方法的优点,没有第三方依赖

consRequires Java 8

缺点需要Java 8



2. Adding To List- The solution proposed by Victor2748where a new list is constructed and added to, if and only if the list doesn't already contain the value. Note that I also preallocate the destination list at the size of the original (the max possible) to prevent any reallocations:

2. 添加到列表- Victor2748提出的解决方案,当且仅当列表不包含该值时,构建并添加新列表。请注意,我还以原始大小(可能的最大值)预先分配了目标列表,以防止任何重新分配:

public List<Integer> testAddingToList(List<Integer> listInts) {
    List<Integer> listDistinctInts = new ArrayList<>(listInts.size());
    for(Integer i : listInts)
    {
        if( !listDistinctInts.contains(i) ) { listDistinctInts.add(i); }
    }
    return listDistinctInts;
}

prosWorks in any Java version, no need to create a Set and then copy, no 3rd party deps

pros适用于任何 Java 版本,无需创建 Set 然后复制,无需 3rd 方 deps

consNeeds to repeatedly check the List for existing values as we build it

缺点需要在构建列表时反复检查列表中的现有值



3. GS Collections Fast(now Eclipse collections)- The solution proposed by Craig P. Motlinusing the GS Collections libraryand their custom List type FastList:

3. GS Collections Fast (现在是 Eclipse 集合)- Craig P. Motlin使用GS Collections 库及其自定义列表类型提出的解决方案FastList

public List<Integer> testGsCollectionsFast(FastList listFast)
{
    return listFast.distinct();
}

prosReportedly very quick, simple expressive code, works in Java 7 and 8

pros据说非常快速,简单的表达代码,适用于 Java 7 和 8

consRequires 3rd party library and a FastListrather than a regular List<Integer>

缺点需要第三方库和FastList而不是常规List<Integer>



4. GS Collections Adapted- The FastList solution wasn't quite comparing like-for-like because it needed a FastListpassed to the method rather than a good ol' ArrayList<Integer>so I also tested the adapter method Craig proposed:

4. GS Collections Adapted- FastList 解决方案并没有完全比较同类,因为它需要FastList传递给方法而不是一个好的 ol',ArrayList<Integer>所以我还测试了 Craig 提出的适配器方法:

public List<Integer> testGsCollectionsAdapted(List<Integer> listInts)
{
    return listAdapter.adapt(listInts).distinct();
}

prosDoesn't require a FastList, works in Java 7 and 8

优点不需要FastList,适用于 Java 7 和 8

consHas to adapt List so may not perform as well, needs 3rd party library

缺点必须适应 List 所以可能表现不佳,需要 3rd 方库



5. Guava ImmutableSet- The method proposed by Louis Wassermanin comments, and by 卢声远 Shengyuan Luin their answer using Guava:

5. Guava ImmutableSet- Louis Wasserman在评论中提出的方法,以及卢声远 Shengyuan Lu在他们使用Guava的回答中提出的方法:

public List<Integer> testGuavaImmutable(List<Integer> listInts)
{
    return ImmutableSet.copyOf(listInts).asList();
}

prosReportedly very fast, works in Java 7 or 8

利弊据说非常快,工作在Java的7或8

consReturns an Immutable List, can't handle nulls in the input List, and requires 3rd party library

cons返回一个Immutable List,无法处理输入列表中的空值,并且需要 3rd 方库



7. HashSet- My original idea (also recommended by EverV0id, ulixand Radiodef)

7. HashSet- 我最初的想法(也被EverV0idulix和 Radiodef 推荐)

public List<Integer> testHashSet(List<Integer> listInts)
{
    return new ArrayList<Integer>(new HashSet<Integer>(listInts));
}

prosWorks in Java 7 and 8, no 3rd party dependencies

利弊作品中的Java 7和8,没有第三方的依赖

consDoesn't retain original order of list, has to construct set then copy to list.

cons不保留列表的原始顺序,必须构造集合然后复制到列表。



6. LinkedHashSet- Since the HashSetsolution didn't preserve the order of the Integers in the original list I also tested a version which uses LinkedHashSet to preserve order:

6. LinkedHashSet- 由于HashSet解决方案没有保留原始列表中整数的顺序,我还测试了一个使用 LinkedHashSet 保留顺序的版本:

public List<Integer> testLinkedHashSet(List<Integer> listInts)
{
    return new ArrayList<Integer>(new LinkedHashSet<Integer>(listInts));
}

prosRetains original ordering, works in Java 7 and 8, no 3rd party dependencies

优点保留原始顺序,可在 Java 7 和 8 中运行,无第 3 方依赖项

consUnlikely to be as fast as regular HashSetapproach

缺点不太可能像常规HashSet方法一样快



Results

结果

Here are my results for various different sizes of listInts(results ordered from slowest to fastest):

以下是我对各种不同大小的listInts结果(结果从最慢到最快排序):

1. taking distinct from ArrayList of 100,000 random ints between 0-50,000 (ie. big list, some duplicates)

1. 与 ArrayList 的 0-50,000 之间的 100,000 个随机整数不同(即大列表,一些重复)

Benchmark                Mode       Samples     Mean   Mean error    Units

AddingToList            thrpt        10        0.505        0.012    ops/s
Java8Stream             thrpt        10      234.932       31.959    ops/s
LinkedHashSet           thrpt        10      262.185       16.679    ops/s
HashSet                 thrpt        10      264.295       24.154    ops/s
GsCollectionsAdapted    thrpt        10      357.998       18.468    ops/s
GsCollectionsFast       thrpt        10      363.443       40.089    ops/s
GuavaImmutable          thrpt        10      469.423       26.056    ops/s

2. taking distinct from ArrayList of 1000 random ints between 0-50 (ie. medium list, many duplicates)

2. 与 0-50 之间的 1000 个随机整数的 ArrayList 不同(即中等列表,许多重复)

Benchmark                Mode       Samples     Mean   Mean error    Units

AddingToList            thrpt        10    32794.698     1154.113    ops/s
HashSet                 thrpt        10    61622.073     2752.557    ops/s
LinkedHashSet           thrpt        10    67155.865     1690.119    ops/s
Java8Stream             thrpt        10    87440.902    13517.925    ops/s
GsCollectionsFast       thrpt        10   103490.738    35302.201    ops/s
GsCollectionsAdapted    thrpt        10   143135.973     4733.601    ops/s
GuavaImmutable          thrpt        10   186301.330    13421.850    ops/s

3. taking distinct from ArrayList of 100 random ints between 0-100 (ie. small list, some duplicates)

3. 与 ArrayList 的 0-100 之间的 100 个随机整数不同(即小列表,一些重复)

Benchmark                Mode       Samples     Mean   Mean error    Units

AddingToList            thrpt        10   278435.085    14229.285    ops/s
Java8Stream             thrpt        10   397664.052    24282.858    ops/s
LinkedHashSet           thrpt        10   462701.618    20098.435    ops/s
GsCollectionsAdapted    thrpt        10   477097.125    15212.580    ops/s
GsCollectionsFast       thrpt        10   511248.923    48155.211    ops/s
HashSet                 thrpt        10   512003.713    25886.696    ops/s
GuavaImmutable          thrpt        10  1082006.560    18716.012    ops/s

4. taking distinct from ArrayList of 10 random ints between 0-50 (ie. tiny list, few duplicates)

4. 与 ArrayList 的 0-50 之间的 10 个随机整数不同(即小列表,很少有重复)

Benchmark                Mode       Samples     Mean   Mean error    Units

Java8Stream             thrpt        10  2739774.758   306124.297    ops/s
LinkedHashSet           thrpt        10  3607479.332   150331.918    ops/s
HashSet                 thrpt        10  4238393.657   185624.358    ops/s
GsCollectionsAdapted    thrpt        10  5919254.755   495444.800    ops/s
GsCollectionsFast       thrpt        10  7916079.963  1708778.450    ops/s
AddingToList            thrpt        10  7931479.667   966331.036    ops/s
GuavaImmutable          thrpt        10  9021621.880   845936.861    ops/s

Conclusions

结论

  • If you're only taking the distinct items from a list once, and the list isn't very long anyof these methods should be adequate.

  • The most efficient general approaches came from the 3rd party libraries: GS Collections and Guava performed admirably.

  • You may need to consider the size of your list and the likely number of duplicates when selecting the most performant method.

  • The naive approach of adding to a new list only if the value isn't already in it works great for tiny lists, but as soon as you have more than a handful of values in the input list it performs the worst of the methods tried.

  • The Guava ImmutableSet.copyOf(listInts).asList()method works the fastest in most situations. But take note of the restrictions: the returned list is Immutableand the input list cannot contain nulls.

  • The HashSetmethod performs the best of the non 3rd party approaches and usually better than Java 8 streams, but reorders the integers (which may or may not be an issue depending on your use-case).

  • The LinkedHashSetapproach keeps the ordering but unsurprisingly was usually worse than the HashSet method.

  • Both the HashSetand LinkedHashSetmethods will perform worse when using lists of data types that have complex HashCode calculations, so do your own profiling if you're trying to select distinct Foos from a List<Foo>.

  • If you already have GS Collectionsas a dependency then it performs very well and is more flexible than the ImmutableList Guavaapproach. If you don't have it as a dependency, it's worth considering adding it if the performance of selecting distinct items is critical to the performance of your application.

  • Disappointingly Java 8 streams seemed to perform fairly poorly. There may be a better way to code the distinct()call than the way I used, so comments or other answers are of course welcome.

  • 如果您只从列表中获取不同的项目一次,并且列表不是很长,那么这些方法中的任何一个都应该足够了。

  • 最有效的通用方法来自第 3 方库:GS Collections 和 Guava 表现出色。

  • 在选择最高效的方法时,您可能需要考虑列表的大小和可能的重复数量。

  • 仅当值不在新列表中时才添加到新列表的幼稚方法对于小列表非常有效,但是一旦输入列表中有多个值,它就会执行尝试过的最差方法。

  • GuavaImmutableSet.copyOf(listInts).asList()方法在大多数情况下运行速度最快。但请注意限制:返回的列表是Immutable,输入列表不能包含空值。

  • HashSet方法是非 3rd 方方法中最好的,通常比 Java 8 流更好,但对整数重新排序(这可能是也可能不是问题,具体取决于您的用例)。

  • LinkedHashSet方法保持排序,但不出所料,它通常比 HashSet 方法更糟糕。

  • 当使用具有复杂 HashCode 计算的数据类型列表时,theHashSetLinkedHashSet方法的性能会更差,因此如果您尝试Foo从 a 中选择不同的s,请自行分析List<Foo>

  • 如果您已经将GS Collections作为依赖项,那么它的性能非常好,并且比 ImmutableList Guava方法更灵活。如果您没有将其作为依赖项,并且选择不同项的性能对应用程序的性能至关重要,则值得考虑添加它。

  • 令人失望的是,Java 8 流的性能似乎相当差。可能有distinct()比我使用的方式更好的调用编码方式,因此当然欢迎评论或其他答案。

NB. I'm no expert at MicroBenchmarking, so if anyone finds flaws in my results or methodology please notify me and I'll endeavour to correct the Answer.

注意。我不是 MicroBenchmarking 的专家,所以如果有人发现我的结果或方法有缺陷,请通知我,我会努力纠正答案。

回答by Righto

This should work:

这应该有效:

yourlist.stream().map(your wrapper that overrides equals and hashchode method::new).distinct().map(wrapper defined above::method that returns the final output).collect(Collectors.toList());

yourlist.stream().map(你覆盖equals和hashchode方法的包装器::new).distinct().map(上面定义的包装器::返回最终输出的方法).collect(Collectors.toList());