在 Java 中删除数组中重复项的最佳方法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/357421/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the best way to remove duplicates in an Array in Java?
提问by Liggy
I have an Array of Objects that need the duplicates removed/filtered. I was going to just override equals & hachCode on the Object elements, and then stick them in a Set... but I figured I should at least poll stackoverflow to see if there was another way, perhaps some clever method of some other API?
我有一个需要删除/过滤重复项的对象数组。我打算在 Object 元素上覆盖 equals 和 hachCode,然后将它们放在一个 Set 中......但我想我至少应该轮询 stackoverflow 以查看是否有另一种方式,也许是其他一些 API 的一些聪明方法?
采纳答案by brabster
I would agree with your approach to override hashCode()
and equals()
and use something that implements Set
.
我会用你的方法来覆盖同意hashCode()
,并equals()
和使用的东西实现Set
。
Doing so also makes it absolutely clear to any other developers that the non-duplicate characteristic is required.
这样做还可以让任何其他开发人员完全清楚需要非重复特性。
Another reason - you get to choose an implementation that meets your needs best now:
另一个原因 - 您现在可以选择最能满足您需求的实现:
and you don't have to change your code to change the implementation in the future.
并且您不必更改代码来更改将来的实现。
回答by Michael Myers
A Set
is definitely your best bet. The only way to remove things from an array (without creating a new one) is to null them out, and then you end up with a lot of null-checks later.
ASet
绝对是你最好的选择。从数组中删除内容(而不创建新数组)的唯一方法是将它们清空,然后您最终会进行大量的空检查。
回答by Dan Vinton
Overriding equals
and hashCode
and creating a set was my first thought too. It's good practice to have some overridden version of these methods anyway in your inheritance hierarchy.
覆盖equals
和hashCode
创建一个集合也是我的第一个想法。无论如何,在继承层次结构中拥有这些方法的一些覆盖版本是一种很好的做法。
I thinkthat if you use a LinkedHashSet
you'll even preserve order of unique elements...
我认为如果你使用 aLinkedHashSet
你甚至会保留独特元素的顺序......
回答by Markus Lausberg
I found this in the web
我在网上找到了这个
Here are two methods that allow you to remove duplicates in an ArrayList. removeDuplicate does not maintain the order where as removeDuplicateWithOrder maintains the order with some performance overhead.
这里有两种方法可以让您删除 ArrayList 中的重复项。removeDuplicate 不维护顺序,而 removeDuplicateWithOrder 以一些性能开销维护顺序。
The removeDuplicate Method:
/** List order not maintained **/ public static void removeDuplicate(ArrayList arlList) { HashSet h = new HashSet(arlList); arlList.clear(); arlList.addAll(h); }
The removeDuplicateWithOrder Method:
/** List order maintained **/ public static void removeDuplicateWithOrder(ArrayList arlList) { Set set = new HashSet(); List newList = new ArrayList(); for (Iterator iter = arlList.iterator(); iter.hasNext();) { Object element = iter.next(); if (set.add(element)) newList.add(element); } arlList.clear(); arlList.addAll(newList); }
removeDuplicate 方法:
/** List order not maintained **/ public static void removeDuplicate(ArrayList arlList) { HashSet h = new HashSet(arlList); arlList.clear(); arlList.addAll(h); }
removeDuplicateWithOrder 方法:
/** List order maintained **/ public static void removeDuplicateWithOrder(ArrayList arlList) { Set set = new HashSet(); List newList = new ArrayList(); for (Iterator iter = arlList.iterator(); iter.hasNext();) { Object element = iter.next(); if (set.add(element)) newList.add(element); } arlList.clear(); arlList.addAll(newList); }
回答by TravisO
Speaking from a general programming standard you could always double enumerate the collections then the compare the source and target.
从通用编程标准来看,您总是可以双重枚举集合,然后比较源和目标。
And if your inner enumeration always starts one entry after the source, it's fairly efficient (pseudo code to follow)
如果你的内部枚举总是在源之后开始一个条目,它是相当有效的(要遵循伪代码)
foreach ( array as source )
{
// keep track where we are in the array
place++;
// loop the array starting at the entry AFTER the current one we are comparing to
for ( i=place+1; i < max(array); i++ )
{
if ( source === array[place] )
{
destroy(array[i]);
}
}
}
You could arguably add a break; statement after the destroy but then you only discover the first duplicate, but if that's all you will ever have, then it would be a nice small optimization.
可以说,您可以添加一个休息时间;在销毁之后的语句,但您只会发现第一个重复项,但如果这就是您所拥有的全部,那么这将是一个不错的小优化。
回答by Joachim Sauer
I'd like to reiterate the point made by Jason in the comments:
我想重申杰森在评论中提出的观点:
Why place yourself at that point at all?
为什么要把自己放在那个点上?
Why use an array for a data structure that shouldn't hold duplicates at all?
为什么将数组用于根本不应包含重复项的数据结构?
Use a Set
or a SortedSet
(when the elements have a natural order as well) at all times to hold the elements. If you need to keep the insertion order, then you can use the LinkedHashSet
as it has been pointed out.
始终使用 aSet
或 a SortedSet
(当元素也具有自然顺序时)来保存元素。如果您需要保留插入顺序,那么您可以使用LinkedHashSet
已经指出的 。
Having to post-process some data structure is often a hint that you should have choosen a different one to begin with.
必须对某些数据结构进行后处理通常暗示您应该选择不同的数据结构开始。
回答by joel.neely
Of course the original post begs the question, "How did you get that array (that might contain duplicated entries) in the first place?"
当然,原始帖子提出了一个问题,“您是如何获得该数组(可能包含重复条目)的?”
Do you need the array (with duplicates) for other purposes, or could you simply use a Set from the beginning?
您是否需要将数组(带有重复项)用于其他目的,或者您可以从一开始就简单地使用 Set 吗?
Alternately, if you need to know the number of occurrences of each value, you could use a Map<CustomObject, Integer>
to track counts. Also, the Google Collectionsdefinition of the Multimap classes may be of use.
或者,如果您需要知道每个值的出现次数,您可以使用 aMap<CustomObject, Integer>
来跟踪计数。此外,Multimap 类的Google Collections定义可能有用。
回答by Ryan Delucchi
Basically, you want a LinkedHashSet<T>
implementation that supports the List<T>
interface for random access. Hence, this is what you need:
基本上,您需要一个LinkedHashSet<T>
支持List<T>
随机访问接口的实现。因此,这就是您需要的:
public class LinkedHashSetList<T> extends LinkedHashSet<T> implements List<T> {
// Implementations for List<T> methods here
...
}
public class LinkedHashSetList<T> extends LinkedHashSet<T> implements List<T> {
// Implementations for List<T> methods here
...
}
The implementation of the List<T>
methods would access and manipulate the underlying LinkedHashSet<T>
. The trick is to have this class behave correctly when one attempts to add duplicates via the List<T>
add methods (throwing an exception or re-adding the item at a different index would be options: which you can either choose one of or make configurable by users of the class).
List<T>
方法的实现将访问和操作底层LinkedHashSet<T>
. 诀窍是在尝试通过List<T>
add 方法添加重复项时让此类行为正确(抛出异常或在不同索引处重新添加项目将是选项:您可以选择其中之一或由用户进行配置)班上)。
回答by didxga
Use a List distinctList
to record element at the first time iterator
stumble into it, returns the distinctList as list removed all duplicates
使用 ListdistinctList
记录第一次iterator
偶然发现的元素,返回 distinctList 作为列表删除所有重复项
private List removeDups(List list) {
Set tempSet = new HashSet();
List distinctList = new ArrayList();
for(Iterator it = list.iterator(); it.hasNext();) {
Object next = it.next();
if(tempSet.add(next)) {
distinctList.add(next);
}
}
return distinctList;
}