list Scala:删除对象列表中的重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3912753/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scala: Remove duplicates in list of objects
提问by parsa
I've got a list of objects List[Object]
which are all instantiated from the same class. This class has a field which must be unique Object.property
. What is the cleanest way to iterate the list of objects and remove all objects(but the first) with the same property?
我有一个对象列表,这些对象List[Object]
都是从同一个类实例化的。这个类有一个必须是唯一的字段Object.property
。迭代对象列表并删除具有相同属性的所有对象(但第一个)的最干净方法是什么?
回答by IttayD
list.groupBy(_.property).map(_._2.head)
Explanation: The groupBy method accepts a function that converts an element to a key for grouping. _.property
is just shorthand for elem: Object => elem.property
(the compiler generates a unique name, something like x$1
). So now we have a map Map[Property, List[Object]]
. A Map[K,V]
extends Traversable[(K,V)]
. So it can be traversed like a list, but elements are a tuple. This is similar to Java's Map#entrySet()
. The map method creates a new collection by iterating each element and applying a function to it. In this case the function is _._2.head
which is shorthand for elem: (Property, List[Object]) => elem._2.head
. _2
is just a method of Tuple that returns the second element. The second element is List[Object] and head
returns the first element
说明: groupBy 方法接受一个将元素转换为用于分组的键的函数。_.property
只是elem: Object => elem.property
(编译器生成一个唯一名称,例如x$1
)的简写。所以现在我们有一张地图Map[Property, List[Object]]
。AMap[K,V]
延伸Traversable[(K,V)]
。所以它可以像列表一样遍历,但元素是一个元组。这类似于 Java 的Map#entrySet()
. map 方法通过迭代每个元素并对其应用函数来创建一个新集合。在这种情况下,函数是_._2.head
which 的简写elem: (Property, List[Object]) => elem._2.head
。_2
只是一个返回第二个元素的元组方法。第二个元素是 List[Object] 并head
返回第一个元素
To get the result to be a type you want:
要使结果成为您想要的类型:
import collection.breakOut
val l2: List[Object] = list.groupBy(_.property).map(_._2.head)(breakOut)
To explain briefly, map
actually expects two arguments, a function and an object that is used to construct the result. In the first code snippet you don't see the second value because it is marked as implicit and so provided by the compiler from a list of predefined values in scope. The result is usually obtained from the mapped container. This is usually a good thing. map on List will return List, map on Array will return Array etc. In this case however, we want to express the container we want as result. This is where the breakOut method is used. It constructs a builder (the thing that builds results) by only looking at the desired result type. It is a generic method and the compiler infers its generic types because we explicitly typed l2 to be List[Object]
or, to preserve order (assuming Object#property
is of type Property
):
简单解释一下,map
实际上需要两个参数,一个函数和一个用于构造结果的对象。在第一个代码片段中,您看不到第二个值,因为它被标记为隐式,因此由编译器从范围内的预定义值列表中提供。结果通常从映射的容器中获得。这通常是一件好事。列表上的映射将返回列表,数组上的映射将返回数组等。然而,在这种情况下,我们想表达我们想要的容器作为结果。这是使用 breakOut 方法的地方。它仅通过查看所需的结果类型来构建一个构建器(构建结果的东西)。它是一个泛型方法,编译器推断它的泛型类型,因为我们显式地将 l2 键入为List[Object]
或,以保持顺序(假设Object#property
是类型Property
):
list.foldRight((List[Object](), Set[Property]())) {
case (o, cum@(objects, props)) =>
if (props(o.property)) cum else (o :: objects, props + o.property))
}._1
foldRight
is a method that accepts an initial result and a function that accepts an element and returns an updated result. The method iterates each element, updating the result according to applying the function to each element and returning the final result. We go from right to left (rather than left to right with foldLeft
) because we are prepending to objects
- this is O(1), but appending is O(N). Also observe the good styling here, we are using a pattern match to extract the elements.
foldRight
是接受初始结果的方法和接受元素并返回更新结果的函数。该方法迭代每个元素,根据将函数应用于每个元素并返回最终结果来更新结果。我们从右到左(而不是从左到右foldLeft
),因为我们正在准备objects
- 这是 O(1),但附加是 O(N)。还要注意这里的良好样式,我们使用模式匹配来提取元素。
In this case, the initial result is a pair (tuple) of an empty list and a set. The list is the result we're interested in and the set is used to keep track of what properties we already encountered. In each iteration we check if the set props
already contains the property (in Scala, obj(x)
is translated to obj.apply(x)
. In Set
, the method apply
is def apply(a: A): Boolean
. That is, accepts an element and returns true / false if it exists or not). If the property exists (already encountered), the result is returned as-is. Otherwise the result is updated to contain the object (o :: objects
) and the property is recorded (props + o.property
)
在这种情况下,初始结果是空列表和集合的一对(元组)。列表是我们感兴趣的结果,集合用于跟踪我们已经遇到的属性。在每次迭代中,我们检查集合是否props
已经包含属性(在 Scala 中,obj(x)
被转换为obj.apply(x)
。在 中Set
,方法apply
是def apply(a: A): Boolean
。也就是说,接受一个元素并返回真/假,如果它存在与否)。如果该属性存在(已遇到),则按原样返回结果。否则,结果将更新为包含对象 ( o :: objects
) 并记录属性 ( props + o.property
)
Update: @andreypopp wanted a generic method:
更新:@andreypopp 想要一个通用方法:
import scala.collection.IterableLike
import scala.collection.generic.CanBuildFrom
class RichCollection[A, Repr](xs: IterableLike[A, Repr]){
def distinctBy[B, That](f: A => B)(implicit cbf: CanBuildFrom[Repr, A, That]) = {
val builder = cbf(xs.repr)
val i = xs.iterator
var set = Set[B]()
while (i.hasNext) {
val o = i.next
val b = f(o)
if (!set(b)) {
set += b
builder += o
}
}
builder.result
}
}
implicit def toRich[A, Repr](xs: IterableLike[A, Repr]) = new RichCollection(xs)
to use:
使用:
scala> list.distinctBy(_.property)
res7: List[Obj] = List(Obj(1), Obj(2), Obj(3))
Also note that this is pretty efficient as we are using a builder. If you have really large lists, you may want to use a mutable HashSet instead of a regular set and benchmark the performance.
另请注意,这非常有效,因为我们正在使用构建器。如果您有非常大的列表,您可能希望使用可变 HashSet 而不是常规集合并测试性能。
回答by Xavier Guihot
Starting Scala 2.13
, most collections are now provided with a distinctBy
method which returns all elements of the sequence ignoring the duplicates after applying a given transforming function:
开始Scala 2.13
,大多数集合现在都提供了一个distinctBy
方法,该方法在应用给定的转换函数后返回序列的所有元素,忽略重复项:
list.distinctBy(_.property)
For instance:
例如:
List(("a", 2), ("b", 2), ("a", 5)).distinctBy(_._1) // List((a,2), (b,2))
List(("a", 2.7), ("b", 2.1), ("a", 5.4)).distinctBy(_._2.floor) // List((a,2.7), (a,5.4))
回答by Landei
Here is a little bit sneaky but fast solution that preserves order:
这是一个有点偷偷摸摸但保持秩序的快速解决方案:
list.filterNot{ var set = Set[Property]()
obj => val b = set(obj.property); set += obj.property; b}
Although it uses internally a var, I think it is easier to understand and to read than the foldLeft-solution.
虽然它在内部使用了一个 var,但我认为它比 foldLeft 解决方案更容易理解和阅读。
回答by walla
One more solution
另一种解决方案
@tailrec
def collectUnique(l: List[Object], s: Set[Property], u: List[Object]): List[Object] = l match {
case Nil => u.reverse
case (h :: t) =>
if (s(h.property)) collectUnique(t, s, u) else collectUnique(t, s + h.prop, h :: u)
}
回答by Timothy Klim
With preserve order:
保留订单:
def distinctBy[L, E](list: List[L])(f: L => E): List[L] =
list.foldLeft((Vector.empty[L], Set.empty[E])) {
case ((acc, set), item) =>
val key = f(item)
if (set.contains(key)) (acc, set)
else (acc :+ item, set + key)
}._1.toList
distinctBy(list)(_.property)
回答by Abel Terefe
A lot of good answers above. However, distinctBy
is already in Scala, but in a not-so-obvious place. Perhaps you can use it like
上面有很多很好的答案。然而,distinctBy
已经在 Scala 中,只是在一个不那么明显的地方。也许你可以像这样使用它
def distinctBy[A, B](xs: List[A])(f: A => B): List[A] =
scala.reflect.internal.util.Collections.distinctBy(xs)(f)
回答by Jodiug
I found a way to make it work with groupBy, with one intermediary step:
我找到了一种方法,让它与 groupBy 一起工作,只需一个中间步骤:
def distinctBy[T, P, From[X] <: TraversableLike[X, From[X]]](collection: From[T])(property: T => P): From[T] = {
val uniqueValues: Set[T] = collection.groupBy(property).map(_._2.head)(breakOut)
collection.filter(uniqueValues)
}
Use it like this:
像这样使用它:
scala> distinctBy(List(redVolvo, bluePrius, redLeon))(_.color)
res0: List[Car] = List(redVolvo, bluePrius)
Similar to IttayD's first solution, but it filters the original collection based on the set of unique values. If my expectations are correct, this does three traversals: one for groupBy
, one for map
and one for filter
. It maintains the ordering of the original collection, but does not necessarily take the first value for each property. For example, it could have returned List(bluePrius, redLeon)
instead.
类似于 IttayD 的第一个解决方案,但它根据唯一值集过滤原始集合。如果我的期望是正确的,这会进行三个遍历:一个 for groupBy
,一个 formap
和一个 for filter
。它保持原始集合的顺序,但不一定为每个属性取第一个值。例如,它本可以返回List(bluePrius, redLeon)
。
Of course, IttayD's solution is still faster since it does only one traversal.
当然,IttayD 的解决方案仍然更快,因为它只进行一次遍历。
My solution also has the disadvantage that, if the collection has Car
s that are actually the same, both will be in the output list. This could be fixed by removing filter
and returning uniqueValues
directly, with type From[T]
. However, it seems like CanBuildFrom[Map[P, From[T]], T, From[T]]
does not exist... suggestions are welcome!
我的解决方案也有一个缺点,如果集合Car
中有实际上相同的 s,两者都将在输出列表中。这可以通过使用 type 直接删除filter
和返回来解决。但是,它似乎不存在......欢迎提出建议!uniqueValues
From[T]
CanBuildFrom[Map[P, From[T]], T, From[T]]
回答by F. P. Freely
With a collection and a function from a record to a key this yields a list of records distinct by key. It's not clear whether groupBy will preserve the order in the original collection. It may even depend on the type of collection. I'm guessing either head
or last
will consistently yield the earliest element.
使用从记录到键的集合和函数,这会生成按键不同的记录列表。不清楚 groupBy 是否会保留原始集合中的顺序。它甚至可能取决于集合的类型。我猜测要么head
或last
将始终如一地产生最早的元素。
collection.groupBy(keyFunction).values.map(_.head)
When will Scala get a nubBy
? It's been in Haskell for decades.
ScalanubBy
什么时候能拿到?它已经在 Haskell 中使用了几十年。
回答by swdev
If you want to remove duplicates and preserve the order of the listyou can try this two liner:
如果您想删除重复项并保留列表的顺序,您可以尝试以下两个班轮:
val tmpUniqueList = scala.collection.mutable.Set[String]()
val myUniqueObjects = for(o <- myObjects if tmpUniqueList.add(o.property)) yield o