java 有效地找到可变数量的字符串集的交集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2851938/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficiently finding the intersection of a variable number of sets of strings
提问by tshred
I have a variable number of ArrayList's that I need to find the intersection of. A realistic cap on the number of sets of strings is probably around 35 but could be more. I don't want any code, just ideas on what could be efficient. I have an implementation that I'm about to start coding but want to hear some other ideas.
我有一个可变数量的 ArrayList,我需要找到它们的交集。字符串组数量的实际上限可能约为 35,但也可能更多。我不想要任何代码,只想要什么是高效的想法。我有一个即将开始编码的实现,但想听听其他一些想法。
Currently, just thinking about my solution, it looks like I should have an asymptotic run-time of Θ(n2).
目前,仅考虑我的解决方案,看起来我应该有一个 Θ(n 2)的渐近运行时间。
Thanks for any help!
谢谢你的帮助!
tshred
切碎
Edit: To clarify, I really just want to know is there a faster way to do it. Faster than Θ(n2).
编辑:澄清一下,我真的只是想知道有没有更快的方法来做到这一点。比 Θ(n 2)快。
回答by Michael Borgwardt
Set.retainAll()is how you find the intersection of two sets. If you use HashSet, then converting your ArrayLists to Sets and using retainAll()in a loop over all of them is actually O(n).
Set.retainAll()是你如何找到两个集合的交集。如果您使用HashSet,那么将您的ArrayLists转换为Sets 并retainAll()在所有它们的循环中使用实际上是 O(n)。
回答by bowmore
The accepted answer is just fine; as an update : since Java 8 there is a slightly more efficient way to find the intersection of two Sets.
接受的答案很好;作为更新:从 Java 8 开始,有一种更有效的方法来找到两个Sets的交集。
Set<String> intersection = set1.stream()
.filter(set2::contains)
.collect(Collectors.toSet());
The reason it is slightly more efficient is because the original approach had to add elements of set1it then had to remove again if they weren't in set2. This approach only adds to the result set what needs to be in there.
它稍微更有效的原因是因为原始方法必须添加set1它的元素,然后如果它们不在set2. 这种方法只会向结果集添加需要在那里的内容。
Strictly speaking you could do this pre Java 8 as well, but without Streams the code would have been quite a bit more laborious.
严格来说,您也可以在 Java 8 之前执行此操作,但是如果没有Streams,代码会更加费力。
If both sets differ considerably in size, you would prefer streaming over the smaller one.
如果两组的大小差异很大,您会更喜欢流式传输而不是较小的一组。
回答by Sonson123
There is also the static method Sets.intersection(set1, set2)in Google Guavathat returns an unmodifiable view of the intersection of two sets.
Google Guava中还有一个静态方法Sets.intersection(set1, set2),它返回两个集合的交集的不可修改的视图。
回答by a1ex07
One more idea - if your arrays/sets are different sizes, it makes sense to begin with the smallest.
还有一个想法 - 如果您的数组/集合大小不同,则从最小的开始是有意义的。
回答by Chris Dennett
The best option would be to use HashSet to store the contents of these lists instead of ArrayList. If you can do that, you can create a temporary HashSet to which you add the elements to be intersected (use the putAll(..) method). Do tempSet.retainAll(storedSet) and tempSet will contain the intersection.
最好的选择是使用 HashSet 而不是 ArrayList 来存储这些列表的内容。如果可以,则可以创建一个临时 HashSet,向其中添加要相交的元素(使用 putAll(..) 方法)。做 tempSet.retainAll(storedSet) 和 tempSet 将包含交集。
回答by corsiKa
Sort them (n lg n) and then do binary searches (lg n).
对它们进行排序(n lg n),然后进行二分搜索(lg n)。
回答by Rostislav Matl
You can use single HashSet. It's add() method returns false when the object is alredy in set. adding objects from the lists and marking counts of false return values will give you union in the set + data for histogram (and the objects that have count+1 equal to list count are your intersection). If you throw the counts to TreeSet, you can detect empty intersection early.
您可以使用单个 HashSet。当对象已经在集合中时,它的 add() 方法返回 false。从列表中添加对象并标记错误返回值的计数将使您在直方图的集合 + 数据中并集(并且计数 + 1 等于列表计数的对象是您的交集)。如果您将计数扔到 TreeSet,您可以及早检测到空交叉点。

