scala 将一对列表减少到键映射及其聚合计数的惯用方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13868465/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 04:48:16  来源:igfitidea点击:

Idiomatic way to reduce a list of pairs to a map of keys and their aggregated count?

scala

提问by Eran Medan

I'm trying to recreate Hadoop's word countmap / reduce logic in a simple Scala program for learning

我正在尝试在一个简单的 Scala 程序中重新创建 Hadoop 的字数统计映射/减少逻辑以供学习

This is what I have so far

这是我到目前为止

val words1 = "Hello World Bye World"      
val words2 = "Hello Hadoop Goodbye Hadoop"

val input = List(words1,words2)           
val mapped = input.flatMap(line=>line.split(" ").map(word=>word->1))
    //> mapped  : List[(String, Int)] = List((Hello,1), (World,1), (Bye,1), 
    //                                       (World,1), (Hello,1), (Hadoop,1), 
    //                                       (Goodbye,1), (Hadoop,1))

mapped.foldLeft(Map[String,Int]())((sofar,item)=>{
    if(sofar.contains(item._1)){
        sofar.updated(item._1, item._2 + sofar(item._1))
    }else{
        sofar + item
    }
})                              
    //>Map(Goodbye -> 1, Hello -> 2, Bye -> 1, Hadoop -> 2, World -> 2)

This seems to work, but I'm sure there is a more idiomatic way to handle the reduce part (foldLeft)

这似乎有效,但我确信有一种更惯用的方法来处理减少部分(foldLeft)

I was thinking about perhaps a multimap, but I have a feeling Scala has a way to do this easily

我正在考虑使用 multimap,但我觉得 Scala 有一种方法可以轻松做到这一点

Is there? e.g. a way to add to a map, and if the key exists, instead of replacing it, adding the value to the existing value. I'm sure I've seen this quesion somewhere, but couldn't find it and neither the answer.

在那儿?例如,一种添加到地图的方法,如果键存在,而不是替换它,而是将值添加到现有值。我确定我在某处看到过这个问题,但找不到它,也找不到答案。

I know groupByis the way to do it probably in the real world, but I'm trying to implement it as close as possible to the original map/reduce logic in the link above.

我知道groupBy在现实世界中可能是这样做的,但我正在尝试尽可能接近上面链接中的原始映射/减少逻辑来实现它。

采纳答案by Dominic Bou-Samra

You can use Scalaz's|+|operator because Mapsare part of the Semigrouptypeclass:

您可以使用Scalaz的|+|运营商,因为Maps是部分半群的类型类:

The |+|operator is the Monoid mappendfunction (a Monoid is any "thing" that can be "added" together. Many things can be added together like this: Strings, Ints, Maps, Lists, Options etc. An example:

|+|操作是含半幺群mappend功能(一个Monoid是可以“加”在一起的任何“东西”很多东西都可以加在一起是这样的:字符串,整型,地图,列表选项等。举个例子:

scala> import scalaz._
import scalaz._

scala> import Scalaz._
import Scalaz._

scala> val map1 = Map(1 -> 3 , 2 -> 4)
map1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3, 2 -> 4)

scala> val map2 = Map(1 -> 1, 3 -> 6)
map2: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 3 -> 6)

scala> map1 |+| map2
res2: scala.collection.immutable.Map[Int,Int] = Map(1 -> 4, 3 -> 6, 2 -> 4)

So in your case,rather then create a List[(String,Int)], create a List[Map[String,Int]], and then sum them:

所以在你的情况下,而不是创建一个List[(String,Int)],创建一个List[Map[String,Int]],然后总结它们:

val mapped = input.flatMap(_.split(" ").map(word => Map(word -> 1)))
mapped.suml

回答by Jan

You could use a map that returns 0 as default value. Map offers withDefaultValue:

您可以使用返回 0 作为默认值的映射。地图提供与默认值:

def withDefaultValue[B1 >: B](d: B1): Map[A, B1]

The same map with a given default value:

具有给定默认值的相同地图:

val emptyMap = Map[String,Int]().withDefaultValue(0)
mapped.foldLeft(emptyMap)((sofar,item) => {
    sofar.updated(item._1, item._2 + sofar(item._1))
})  

回答by Eran Witkon

Correct me if I am wrong but how about this:

如果我错了,请纠正我,但是如何:

val w = words.groupBy(_.toString).map(x => (x._1,x._2.size)).toList

assuming words is the List of words:

假设 words 是单词列表:

val words1 = "Hello World Bye World"
val words2 = "Hello Hadoop Goodbye Hadoop"
val words = words1.split(" ") ++ words2.split(" ")
val w = words.groupBy(_.toString).map(x => (x._1,x._2.size)).toList
//List((Goodbye,1), (Hello,2), (Bye,1), (Hadoop,2), (World,2))

回答by hahn

another version:

另一个版本:

 val words1 = "Hello World Bye World"             
//> words1  : java.lang.String = Hello World Bye World
 val words2 = "Hello Hadoop Goodbye Hadoop"       
//> words2  : java.lang.String = Hello Hadoop Goodbye Hadoop

 val words = words1.split(" ") ++ words2.split(" ")
//> words  : Array[java.lang.String] = Array(Hello, World, Bye, World, Hello, Hadoop, Goodbye, Hadoop)

 words.map(m => (m, (0 /: words)
   ((x, y) => if (y == m) x + 1 else x))).
     toList.distinct.toMap
 //> res0: scala.collection.immutable.Map[java.lang.String,Int] = Map(Goodbye -> 1, Hello -> 2, Bye -> 1, Hadoop -> 2, World -> 2)

回答by Xavier Guihot

Starting Scala 2.13, most collections are provided with the groupMapReducemethod which can be seen as a close analog of Hadoop's map/reducelogic:

开始Scala 2.13,大多数集合都提供了groupMapReduce方法,可以将其视为Hadoop's map/reduce逻辑的近似类比:

val words = List(words1, words2).flatMap(_.split(" "))

words.groupMapReduce(identity)(_ => 1)(_ + _)
// immutable.Map[String,Int] = HashMap(Goodbye -> 1, Hello -> 2, Bye -> 1, Hadoop -> 2, World -> 2)

This:

这:

  • splits and merges words from the 2 input lists

  • groups elements by themselves (identity) (group part of groupMapReduce)

  • maps each grouped value occurrence to 1 (map part of groupMapReduce)

  • reduces values within a group of values (_ + _) by summing them (reduce part of groupMapReduce).

  • 从 2 个输入列表中拆分和合并单词

  • groups 元素本身(身份)(MapReduce 的组部分)

  • maps 每个分组值出现为 1(组MapReduce 的映射部分)

  • reduce_ + _通过对一组值 ( ) 中的值进行求和(减少 groupMap Reduce 的一部分)。

This is a one-pass versionof what can be translated by:

这是可以通过以下方式翻译的内容的一次性版本

words.groupBy(identity).mapValues(_.map(_ => 1).reduce(_ + _))