从 Java 7 和 8 中的现有列表创建不同的列表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27464781/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating distinct list from existing list in Java 7 and 8?
提问by Matt Coubrough
If I have:
如果我有:
List<Integer> listInts = { 1, 1, 3, 77, 2, 19, 77, 123, 14, 123... }
in Java what is an efficient way of creating a List<Integer> listDistinctInts
containing only the distinctvalues from listInts
?
在 Java 中,创建List<Integer> listDistinctInts
仅包含来自的不同值的有效方法是listInts
什么?
My immediate thought is to create a Set<Integer> setInts
containing all the values from listInts
then call List<Integer> listDistinctInts = new ArrayList<>(setInts);
我的直接想法是创建一个Set<Integer> setInts
包含所有值的listInts
然后调用List<Integer> listDistinctInts = new ArrayList<>(setInts);
But this seems potentially inefficient - is there a better solution using Java 7?
但这似乎可能效率低下 - 是否有使用 Java 7 的更好解决方案?
I'm not using Java 8, but I believe using it I could do something like this(?):
我没有使用 Java 8,但我相信使用它我可以做这样的事情(?):
List<Integer> listDistinctInts = listInts.stream().distinct().collect(Collectors.toList());
Would this be more performant than the approach above and/or is there any more efficient way of doing this in Java 8?
这会比上面的方法性能更高和/或在 Java 8 中有更有效的方法吗?
Finally, (and I'm aware that asking multiple questions might be frowned upon but it's directly related) if I only cared about the countof distinct elements in listInts
is there a more efficient way to get that value (in Java 7 and 8) - without first creating a list or set of all the distinct elements?
最后,(我知道问多个问题可能会被人反对,但它是直接相关的)如果我只关心不同元素的数量,listInts
是否有更有效的方法来获得该值(在 Java 7 和 8 中) -不首先创建所有不同元素的列表或集合?
I'm most interested in native Java ways of accomplishing this and avoiding re-inventing any wheels but would consider hand-rolled code or libraries if they offer better clarity or performance. I've read this related question Java - Distinct List of Objectsbut it's not entirely clear about the differences in performance between the Java 7 and 8 approaches or whether there might be better techniques?
我最感兴趣的是原生 Java 方法来实现这一点并避免重新发明任何轮子,但如果它们提供更好的清晰度或性能,我会考虑手动滚动代码或库。我已经阅读了这个相关的问题Java - Distinct List of Objects,但并不完全清楚 Java 7 和 8 方法之间的性能差异,或者是否有更好的技术?
回答by Victor2748
When adding a value to a listInts
check:
向listInts
支票添加值时:
int valueToAdd;
//...
if (!listInts.contains(valueToAdd)) {listInts.add(valueToAdd)}
if you have an existing list, use a for-each statement to copy all the values from that list, to a new one, that you want to be "distinct":
如果您有一个现有列表,请使用 for-each 语句将该列表中的所有值复制到您想要“不同”的新列表中:
List<Integer> listWithRepeatedValues;
List<Integer> distinctList;
//...
for (Integer i : listWithRepeatedValues) {
if (!listInts.contains(valueToAdd)) {distinctList.add(i);}
}
回答by Everv0id
You should try new LinkedList(new HashSet(listInts))
.
你应该试试new LinkedList(new HashSet(listInts))
。
回答by ulix
Don't worry. Using a HashSet is a pretty easy and efficient way to eliminate duplicates:
别担心。使用 HashSet 是一种非常简单有效的消除重复的方法:
Set<Integer> uniqueList = new HashSet<>();
uniqueList.addAll(listInts); // Add all elements eliminating duplicates
for (int n : uniqueList) // Check the results (in no particular order)
System.out.println(n);
System.out.println("Number distinct values: " + uniqueList.size());
In a more specific scenario, just in case the range of possible values is known, is not very large, while listInts
is very large.
The most efficient way of counting the number of unique entries in the list that I can think of is:
在更具体的场景中,以防万一可能值的范围已知,不是很大,而listInts
很大。
我能想到的计算列表中唯一条目数量的最有效方法是:
boolean[] counterTable = new boolean[124];
int counter = 0;
for (int n : listInts)
if (!counterTable[n]) {
counter++;
counterTable[n] = true;
}
System.out.println("Number of distinct values: " + counter);
回答by u290629
回答by Craig P. Motlin
If you're using Eclipse Collections(formerly GS Collections), you can use the method distinct()
.
如果您使用Eclipse Collections(以前称为GS Collections),则可以使用 方法distinct()
。
ListIterable<Integer> listInts = FastList.newListWith(1, 1, 3, 77, 2, 19, 77, 123, 14, 123);
Assert.assertEquals(
FastList.newListWith(1, 3, 77, 2, 19, 123, 14),
listInts.distinct());
The advantage of using distinct()
instead of converting to a Set and then back to a List is that distinct()
preserves the order of the original List, retaining the first occurrence of each element. It's implemented by using both a Set and a List.
使用distinct()
而不是转换为 Set 然后再转换回 List的优点是distinct()
保留原始 List 的顺序,保留每个元素的第一次出现。它是通过使用 Set 和 List 来实现的。
MutableSet<T> seenSoFar = UnifiedSet.newSet();
int size = list.size();
for (int i = 0; i < size; i++)
{
T item = list.get(i);
if (seenSoFar.add(item))
{
targetCollection.add(item);
}
}
return targetCollection;
If you cannot convert your original List into a GS Collections type, you can use ListAdapter to get the same API.
如果您无法将您的原始 List 转换为 GS Collections 类型,您可以使用 ListAdapter 来获取相同的 API。
MutableList<Integer> distinct = ListAdapter.adapt(integers).distinct();
There's no way to avoid the creation of the Set. Still, UnifiedSet is more efficient than HashSet so there will be some speed benefit.
没有办法避免创建 Set。尽管如此,UnifiedSet 比 HashSet 更有效,因此会有一些速度优势。
If all you want is the numberof distinct items, it's more efficient to just create a set without creating the list.
如果您想要的只是不同项目的数量,那么只创建一个集合而不创建列表会更有效。
Verify.assertSize(7, UnifiedSet.newSet(listInts));
Eclipse Collections 8.0 requires Java 8. Eclipse Collections 7.x works well with Java 8, but only requires Java 5.
Eclipse Collections 8.0 需要 Java 8。Eclipse Collections 7.x 可以很好地与 Java 8 配合使用,但只需要 Java 5。
Note:I am a committer for Eclipse collections.
注意:我是 Eclipse 集合的提交者。
回答by Matt Coubrough
I've now MicroBenchmarked most of the proposed options from the excellent answers provided. Like most non-trivial performance related questions the answer as to which is best is "it depends".
我现在已经从提供的优秀答案中对大多数建议的选项进行了微基准测试。像大多数与性能相关的重要问题一样,关于哪个最好的答案是“视情况而定”。
All my testing was performed with JMHJava Microbenchmarking Harness.
我所有的测试都是使用JMHJava Microbenchmarking Harness 进行的。
Most of these tests were performed using JDK 1.8, although I performed some of the tests with JDK 1.7 too just to ensure that its performance wasn't too different (it was almost identical). I tested the following techniques taken from the answers supplied so far:
大多数这些测试是使用 JDK 1.8 执行的,尽管我也使用 JDK 1.7 执行了一些测试,只是为了确保其性能没有太大差异(几乎相同)。我从目前提供的答案中测试了以下技术:
1. Java 8 Stream- The solution using stream()
I had proprosed as a possibility if using Java8:
1. Java 8 Stream-stream()
如果使用 Java8,我建议使用的解决方案是:
public List<Integer> testJava8Stream(List<Integer> listInts) {
return listInts.stream().distinct().collect(Collectors.toList());
}
prosmodern Java 8 approach, no 3rd party dependencies
现代 Java 8 方法的优点,没有第三方依赖
consRequires Java 8
缺点需要Java 8
2. Adding To List- The solution proposed by Victor2748where a new list is constructed and added to, if and only if the list doesn't already contain the value. Note that I also preallocate the destination list at the size of the original (the max possible) to prevent any reallocations:
2. 添加到列表- Victor2748提出的解决方案,当且仅当列表不包含该值时,构建并添加新列表。请注意,我还以原始大小(可能的最大值)预先分配了目标列表,以防止任何重新分配:
public List<Integer> testAddingToList(List<Integer> listInts) {
List<Integer> listDistinctInts = new ArrayList<>(listInts.size());
for(Integer i : listInts)
{
if( !listDistinctInts.contains(i) ) { listDistinctInts.add(i); }
}
return listDistinctInts;
}
prosWorks in any Java version, no need to create a Set and then copy, no 3rd party deps
pros适用于任何 Java 版本,无需创建 Set 然后复制,无需 3rd 方 deps
consNeeds to repeatedly check the List for existing values as we build it
缺点需要在构建列表时反复检查列表中的现有值
3. GS Collections Fast(now Eclipse collections)- The solution proposed by Craig P. Motlinusing the GS Collections libraryand their custom List type FastList
:
3. GS Collections Fast (现在是 Eclipse 集合)- Craig P. Motlin使用GS Collections 库及其自定义列表类型提出的解决方案FastList
:
public List<Integer> testGsCollectionsFast(FastList listFast)
{
return listFast.distinct();
}
prosReportedly very quick, simple expressive code, works in Java 7 and 8
pros据说非常快速,简单的表达代码,适用于 Java 7 和 8
consRequires 3rd party library and a FastList
rather than a regular List<Integer>
缺点需要第三方库和FastList
而不是常规List<Integer>
4. GS Collections Adapted- The FastList solution wasn't quite comparing like-for-like because it needed a FastList
passed to the method rather than a good ol' ArrayList<Integer>
so I also tested the adapter method Craig proposed:
4. GS Collections Adapted- FastList 解决方案并没有完全比较同类,因为它需要FastList
传递给方法而不是一个好的 ol',ArrayList<Integer>
所以我还测试了 Craig 提出的适配器方法:
public List<Integer> testGsCollectionsAdapted(List<Integer> listInts)
{
return listAdapter.adapt(listInts).distinct();
}
prosDoesn't require a FastList
, works in Java 7 and 8
优点不需要FastList
,适用于 Java 7 和 8
consHas to adapt List so may not perform as well, needs 3rd party library
缺点必须适应 List 所以可能表现不佳,需要 3rd 方库
5. Guava ImmutableSet- The method proposed by Louis Wassermanin comments, and by 卢声远 Shengyuan Luin their answer using Guava:
5. Guava ImmutableSet- Louis Wasserman在评论中提出的方法,以及卢声远 Shengyuan Lu在他们使用Guava的回答中提出的方法:
public List<Integer> testGuavaImmutable(List<Integer> listInts)
{
return ImmutableSet.copyOf(listInts).asList();
}
prosReportedly very fast, works in Java 7 or 8
利弊据说非常快,工作在Java的7或8
consReturns an Immutable List
, can't handle nulls in the input List, and requires 3rd party library
cons返回一个Immutable List
,无法处理输入列表中的空值,并且需要 3rd 方库
7. HashSet- My original idea (also recommended by EverV0id, ulixand Radiodef)
7. HashSet- 我最初的想法(也被EverV0id、ulix和 Radiodef 推荐)
public List<Integer> testHashSet(List<Integer> listInts)
{
return new ArrayList<Integer>(new HashSet<Integer>(listInts));
}
prosWorks in Java 7 and 8, no 3rd party dependencies
利弊作品中的Java 7和8,没有第三方的依赖
consDoesn't retain original order of list, has to construct set then copy to list.
cons不保留列表的原始顺序,必须构造集合然后复制到列表。
6. LinkedHashSet- Since the HashSet
solution didn't preserve the order of the Integers in the original list I also tested a version which uses LinkedHashSet to preserve order:
6. LinkedHashSet- 由于HashSet
解决方案没有保留原始列表中整数的顺序,我还测试了一个使用 LinkedHashSet 保留顺序的版本:
public List<Integer> testLinkedHashSet(List<Integer> listInts)
{
return new ArrayList<Integer>(new LinkedHashSet<Integer>(listInts));
}
prosRetains original ordering, works in Java 7 and 8, no 3rd party dependencies
优点保留原始顺序,可在 Java 7 和 8 中运行,无第 3 方依赖项
consUnlikely to be as fast as regular HashSet
approach
缺点不太可能像常规HashSet
方法一样快
Results
结果
Here are my results for various different sizes of listInts
(results ordered from slowest to fastest):
以下是我对各种不同大小的listInts
结果(结果从最慢到最快排序):
1. taking distinct from ArrayList of 100,000 random ints between 0-50,000 (ie. big list, some duplicates)
1. 与 ArrayList 的 0-50,000 之间的 100,000 个随机整数不同(即大列表,一些重复)
Benchmark Mode Samples Mean Mean error Units
AddingToList thrpt 10 0.505 0.012 ops/s
Java8Stream thrpt 10 234.932 31.959 ops/s
LinkedHashSet thrpt 10 262.185 16.679 ops/s
HashSet thrpt 10 264.295 24.154 ops/s
GsCollectionsAdapted thrpt 10 357.998 18.468 ops/s
GsCollectionsFast thrpt 10 363.443 40.089 ops/s
GuavaImmutable thrpt 10 469.423 26.056 ops/s
2. taking distinct from ArrayList of 1000 random ints between 0-50 (ie. medium list, many duplicates)
2. 与 0-50 之间的 1000 个随机整数的 ArrayList 不同(即中等列表,许多重复)
Benchmark Mode Samples Mean Mean error Units
AddingToList thrpt 10 32794.698 1154.113 ops/s
HashSet thrpt 10 61622.073 2752.557 ops/s
LinkedHashSet thrpt 10 67155.865 1690.119 ops/s
Java8Stream thrpt 10 87440.902 13517.925 ops/s
GsCollectionsFast thrpt 10 103490.738 35302.201 ops/s
GsCollectionsAdapted thrpt 10 143135.973 4733.601 ops/s
GuavaImmutable thrpt 10 186301.330 13421.850 ops/s
3. taking distinct from ArrayList of 100 random ints between 0-100 (ie. small list, some duplicates)
3. 与 ArrayList 的 0-100 之间的 100 个随机整数不同(即小列表,一些重复)
Benchmark Mode Samples Mean Mean error Units
AddingToList thrpt 10 278435.085 14229.285 ops/s
Java8Stream thrpt 10 397664.052 24282.858 ops/s
LinkedHashSet thrpt 10 462701.618 20098.435 ops/s
GsCollectionsAdapted thrpt 10 477097.125 15212.580 ops/s
GsCollectionsFast thrpt 10 511248.923 48155.211 ops/s
HashSet thrpt 10 512003.713 25886.696 ops/s
GuavaImmutable thrpt 10 1082006.560 18716.012 ops/s
4. taking distinct from ArrayList of 10 random ints between 0-50 (ie. tiny list, few duplicates)
4. 与 ArrayList 的 0-50 之间的 10 个随机整数不同(即小列表,很少有重复)
Benchmark Mode Samples Mean Mean error Units
Java8Stream thrpt 10 2739774.758 306124.297 ops/s
LinkedHashSet thrpt 10 3607479.332 150331.918 ops/s
HashSet thrpt 10 4238393.657 185624.358 ops/s
GsCollectionsAdapted thrpt 10 5919254.755 495444.800 ops/s
GsCollectionsFast thrpt 10 7916079.963 1708778.450 ops/s
AddingToList thrpt 10 7931479.667 966331.036 ops/s
GuavaImmutable thrpt 10 9021621.880 845936.861 ops/s
Conclusions
结论
If you're only taking the distinct items from a list once, and the list isn't very long anyof these methods should be adequate.
The most efficient general approaches came from the 3rd party libraries: GS Collections and Guava performed admirably.
You may need to consider the size of your list and the likely number of duplicates when selecting the most performant method.
The naive approach of adding to a new list only if the value isn't already in it works great for tiny lists, but as soon as you have more than a handful of values in the input list it performs the worst of the methods tried.
The Guava
ImmutableSet.copyOf(listInts).asList()
method works the fastest in most situations. But take note of the restrictions: the returned list isImmutable
and the input list cannot contain nulls.The
HashSet
method performs the best of the non 3rd party approaches and usually better than Java 8 streams, but reorders the integers (which may or may not be an issue depending on your use-case).The
LinkedHashSet
approach keeps the ordering but unsurprisingly was usually worse than the HashSet method.Both the
HashSet
andLinkedHashSet
methods will perform worse when using lists of data types that have complex HashCode calculations, so do your own profiling if you're trying to select distinctFoo
s from aList<Foo>
.If you already have GS Collectionsas a dependency then it performs very well and is more flexible than the ImmutableList Guavaapproach. If you don't have it as a dependency, it's worth considering adding it if the performance of selecting distinct items is critical to the performance of your application.
Disappointingly Java 8 streams seemed to perform fairly poorly. There may be a better way to code the
distinct()
call than the way I used, so comments or other answers are of course welcome.
如果您只从列表中获取不同的项目一次,并且列表不是很长,那么这些方法中的任何一个都应该足够了。
最有效的通用方法来自第 3 方库:GS Collections 和 Guava 表现出色。
在选择最高效的方法时,您可能需要考虑列表的大小和可能的重复数量。
仅当值不在新列表中时才添加到新列表的幼稚方法对于小列表非常有效,但是一旦输入列表中有多个值,它就会执行尝试过的最差方法。
Guava
ImmutableSet.copyOf(listInts).asList()
方法在大多数情况下运行速度最快。但请注意限制:返回的列表是Immutable
,输入列表不能包含空值。该
HashSet
方法是非 3rd 方方法中最好的,通常比 Java 8 流更好,但对整数重新排序(这可能是也可能不是问题,具体取决于您的用例)。该
LinkedHashSet
方法保持排序,但不出所料,它通常比 HashSet 方法更糟糕。当使用具有复杂 HashCode 计算的数据类型列表时,the
HashSet
和LinkedHashSet
方法的性能会更差,因此如果您尝试Foo
从 a 中选择不同的s,请自行分析List<Foo>
。如果您已经将GS Collections作为依赖项,那么它的性能非常好,并且比 ImmutableList Guava方法更灵活。如果您没有将其作为依赖项,并且选择不同项的性能对应用程序的性能至关重要,则值得考虑添加它。
令人失望的是,Java 8 流的性能似乎相当差。可能有
distinct()
比我使用的方式更好的调用编码方式,因此当然欢迎评论或其他答案。
NB. I'm no expert at MicroBenchmarking, so if anyone finds flaws in my results or methodology please notify me and I'll endeavour to correct the Answer.
注意。我不是 MicroBenchmarking 的专家,所以如果有人发现我的结果或方法有缺陷,请通知我,我会努力纠正答案。
回答by Righto
This should work:
这应该有效:
yourlist.stream().map(your wrapper that overrides equals and hashchode method::new).distinct().map(wrapper defined above::method that returns the final output).collect(Collectors.toList());
yourlist.stream().map(你覆盖equals和hashchode方法的包装器::new).distinct().map(上面定义的包装器::返回最终输出的方法).collect(Collectors.toList());