list 如何在列表中查找重复项?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24729544/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 02:09:59  来源:igfitidea点击:

How to find duplicates in a list?

listscalascala-collections

提问by Phil

I have a list of unsortedintegers and I want to find those elements which have duplicates.

我有一个未排序的整数列表,我想找到那些有重复的元素。

val dup = List(1,1,1,2,3,4,5,5,6,100,101,101,102)

I can find the distinct elements of the set with dup.distinct, so I wrote my answer as follows.

我可以用 dup.distinct 找到集合的不同元素,所以我写了我的答案如下。

val dup = List(1,1,1,2,3,4,5,5,6,100,101,101,102)
val distinct = dup.distinct
val elementsWithCounts = distinct.map( (a:Int) => (a, dup.count( (b:Int) => a == b )) )
val duplicatesRemoved = elementsWithCounts.filter( (pair: Pair[Int,Int]) => { pair._2 <= 1 } )
val withDuplicates = elementsWithCounts.filter( (pair: Pair[Int,Int]) => { pair._2 > 1 } )

Is there an easier way to solve this?

有没有更简单的方法来解决这个问题?

回答by dhg

Try this:

尝试这个:

val dup = List(1,1,1,2,3,4,5,5,6,100,101,101,102)
dup.groupBy(identity).collect { case (x, List(_,_,_*)) => x }

The groupByassociates each distinct integer with a list of its occurrences. The collectis basically mapwhere non-matching elements are ignored. The match pattern following casewill match integers xthat are associated with a list that fits the pattern List(_,_,_*), a list with at least two elements, each represented by an underscore since we don't actually need to store those values (and those two elements can be followed by zero or more elements: _*).

groupBy每个不同的整数与其出现的列表相关联。的collect基本上是map其中非匹配元素被忽略。下面的匹配模式case将匹配x与适合该模式List(_,_,_*)的列表相关联的整数,一个包含至少两个元素的列表,每个元素由下划线表示,因为我们实际上不需要存储这些值(并且可以遵循这两个元素由零个或多个元素组成:)_*

You could also do:

你也可以这样做:

dup.groupBy(identity).collect { case (x,ys) if ys.lengthCompare(1) > 0 => x }

It's much faster than the approach you provided since it doesn't have to repeatedly pass over the data.

它比您提供的方法快得多,因为它不必重复传递数据。

回答by Luigi Plinge

A bit late to the party, but here's another approach:

聚会有点晚了,但这是另一种方法:

dup.diff(dup.distinct).distinct

diffgives you all the extra items above those in the argument (dup.distinct), which are the duplicates.

diff为您提供参数 ( dup.distinct)中的所有额外项目,它们是重复项。

回答by Kigyo

Another approach is to use foldLeftand do it the hard way.

另一种方法是使用foldLeft困难的方法。

We start with two empty sets. One is for elements that we have seen at least once. The other for elements that we have seen at least twice (aka duplicates).

我们从两个空集开始。一种是我们至少见过一次的元素。另一个用于我们至少见过两次的元素(又名重复)。

We traverse the list. When the current element has already been seen (seen(cur)) it is a duplicate and therefore added to duplicates. Otherwise we add it to seen. The result is now the second set that contains the duplicates.

我们遍历列表。当当前元素已经被看到 ( seen(cur)) 时,它是一个副本,因此添加到duplicates. 否则,我们将其添加到seen. 结果现在是包含重复项的第二组。

We can also write this as a generic method.

我们也可以把它写成一个泛型方法。

def dups[T](list: List[T]) = list.foldLeft((Set.empty[T], Set.empty[T])){ case ((seen, duplicates), cur) => 
  if(seen(cur)) (seen, duplicates + cur) else (seen + cur, duplicates)      
}._2

val dup = List(1,1,1,2,3,4,5,5,6,100,101,101,102)

dups(dup) //Set(1,5,101)

回答by chaotic3quilibrium

Summary:I've written a very efficient function which returns both List.distinctand a Listconsisting of each element which appeared more than once and the index at which the element duplicate appeared.

总结:我编写了一个非常有效的函数,它返回List.distinct一个List由出现多次的每个元素和出现重复元素的索引组成的 和 。

Details:If you need a bit more information about the duplicates themselves, like I did, I have written a more general function which iterates across a List(as ordering was significant) exactly once and returns a Tuple2consisting of the original Listdeduped (all duplicates after the first are removed; i.e. the same as invoking distinct) and a second Listshowing each duplicate and an Intindex at which it occurred within the original List.

详细信息:如果您需要更多关于重复项本身的信息,就像我一样,我已经编写了一个更通用的函数,它在 a List(因为排序很重要)中迭代一次,并返回一个Tuple2由原始List重复数据组成的(所有重复之后的重复项)第一个被删除;即与调用distinct)相同,第二个List显示每个重复项和Int它在原始 中出现的索引List

I have implemented the function twice based on the general performance characteristics of the Scala collections; filterDupesL(where the L is for Linear) and filterDupesEc(where the Ec is for Effectively Constant).

我已经根据Scala 集合一般性能特征实现了该功能两次;filterDupesL(其中 L 代表线性)和filterDupesEc(其中 Ec 代表有效常数)。

Here's the "Linear" function:

这是“线性”函数:

def filterDupesL[A](items: List[A]): (List[A], List[(A, Int)]) = {
  def recursive(
      remaining: List[A]
    , index: Int =
        0
    , accumulator: (List[A], List[(A, Int)]) =
        (Nil, Nil)): (List[A], List[(A, Int)]
  ) =
    if (remaining.isEmpty)
      accumulator
    else
      recursive(
          remaining.tail
        , index + 1
        , if (accumulator._1.contains(remaining.head)) //contains is linear
          (accumulator._1, (remaining.head, index) :: accumulator._2)
        else
          (remaining.head :: accumulator._1, accumulator._2)
      )
  val (distinct, dupes) = recursive(items)
  (distinct.reverse, dupes.reverse)
}

An below is an example which might make it a bit more intuitive. Given this List of String values:

下面是一个可能使它更直观的示例。鉴于此字符串值列表:

val withDupes =
  List("a.b", "a.c", "b.a", "b.b", "a.c", "c.a", "a.c", "d.b", "a.b")

...and then performing the following:

...然后执行以下操作:

val (deduped, dupeAndIndexs) =
  filterDupesL(withDupes)

...the results are:

...结果是:

deduped: List[String] = List(a.b, a.c, b.a, b.b, c.a, d.b)
dupeAndIndexs: List[(String, Int)] = List((a.c,4), (a.c,6), (a.b,8))

And if you just want the duplicates, you simply mapacross dupeAndIndexesand invoke distinct:

如果您只想要重复项,您只需map跨越dupeAndIndexes并调用distinct

val dupesOnly =
  dupeAndIndexs.map(_._1).distinct

...or all in a single call:

...或全部在一个电话中:

val dupesOnly =
  filterDupesL(withDupes)._2.map(_._1).distinct

...or if a Setis preferred, skip distinctand invoke toSet...

...或者如果 aSet是首选,跳过distinct并调用toSet...

val dupesOnly2 =
  dupeAndIndexs.map(_._1).toSet

...or all in a single call:

...或全部在一个电话中:

val dupesOnly2 =
  filterDupesL(withDupes)._2.map(_._1).toSet


For very large Lists, consider using this more efficient version (which uses an additional Setto change the containscheck in effectively constant time):

对于非常大的Lists,请考虑使用这个更高效的版本(它使用一个额外的Setcontains有效地更改签入恒定时间):

Here's the "Effectively Constant" function:

这是“有效常数”函数:

def filterDupesEc[A](items: List[A]): (List[A], List[(A, Int)]) = {
  def recursive(
      remaining: List[A]
    , index: Int =
        0
    , seenAs: Set[A] =
        Set()
    , accumulator: (List[A], List[(A, Int)]) =
        (Nil, Nil)): (List[A], List[(A, Int)]
  ) =
    if (remaining.isEmpty)
      accumulator
    else {
      val (isInSeenAs, seenAsNext) = {
        val isInSeenA =
          seenAs.contains(remaining.head) //contains is effectively constant
        (
            isInSeenA
          , if (!isInSeenA)
              seenAs + remaining.head
            else
              seenAs
        )
      }
      recursive(
          remaining.tail
        , index + 1
        , seenAsNext
        , if (isInSeenAs)
          (accumulator._1, (remaining.head, index) :: accumulator._2)
        else
          (remaining.head :: accumulator._1, accumulator._2)
      )
    }
  val (distinct, dupes) = recursive(items)
  (distinct.reverse, dupes.reverse)
}

Both of the above functions are adaptations of the filterDupesfunction in my open source Scala library, ScalaOlio. It's located at org.scalaolio.collection.immutable.List_._.

以上两个函数都是对filterDupes我的开源 Scala 库ScalaOlio 中的函数的改编。它位于org.scalaolio.collection.immutable.List_._

回答by user8221989

My favorite is

我最喜欢的是

def hasDuplicates(in: List[Int]): Boolean = {
  val sorted = in.sortWith((i, j) => i < j)
  val zipped = sorted.tail.zip(sorted)
  zipped.exists(p => p._1 == p._2)
}