scala 将元组列表转换为映射(并处理重复键?)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8016750/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert List of tuple to map (and deal with duplicate key ?)
提问by Tg.
I was thinking about a nice way to convert a List of tuple with duplicate key [("a","b"),("c","d"),("a","f")]into map ("a" -> ["b", "f"], "c" -> ["d"]). Normally (in python), I'd create an empty map and for-loop over the list and check for duplicate key. But I am looking for something more scala-ish and clever solution here.
我正在考虑一种将具有重复键的元组列表转换[("a","b"),("c","d"),("a","f")]为 map的好方法("a" -> ["b", "f"], "c" -> ["d"])。通常(在python中),我会在列表上创建一个空映射和for循环并检查重复键。但我在这里寻找更像斯卡拉式和聪明的解决方案。
btw, actual type of key-value I use here is (Int, Node)and I want to turn into a map of (Int -> NodeSeq)
顺便说一句,我在这里使用的键值的实际类型是(Int, Node),我想变成一个映射(Int -> NodeSeq)
采纳答案by om-nom-nom
Group and then project:
分组然后项目:
scala> val x = List("a" -> "b", "c" -> "d", "a" -> "f")
//x: List[(java.lang.String, java.lang.String)] = List((a,b), (c,d), (a,f))
scala> x.groupBy(_._1).map { case (k,v) => (k,v.map(_._2))}
//res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(c -> List(d), a -> List(b, f))
More scalish way to use fold, in the way like there(skip map fstep).
使用 fold 的更简洁的方式,就像那里的方式(跳过map f步骤)。
回答by Cory Klein
For Googlers that don't expect duplicates or are fine with the default duplicate handling policy:
对于不希望有重复或对默认重复处理策略没问题的 Google 员工:
List("a" -> 1, "b" -> 2).toMap
// Result: Map(a -> 1, c -> 2)
As of 2.12, the default policy reads:
从 2.12 开始,默认策略为:
Duplicate keys will be overwritten by later keys: if this is an unordered collection, which key is in the resulting map is undefined.
重复的键将被后面的键覆盖:如果这是一个无序集合,则结果映射中的哪个键是未定义的。
回答by Daniel C. Sobral
Here's another alternative:
这是另一种选择:
x.groupBy(_._1).mapValues(_.map(_._2))
回答by pathikrit
For Googlers that do care about duplicates:
对于关心重复的 Google 员工:
implicit class Pairs[A, B](p: List[(A, B)]) {
def toMultiMap: Map[A, List[B]] = p.groupBy(_._1).mapValues(_.map(_._2))
}
> List("a" -> "b", "a" -> "c", "d" -> "e").toMultiMap
> Map("a" -> List("b", "c"), "d" -> List("e"))
回答by Xavier Guihot
Starting Scala 2.13, most collections are provided with the groupMapmethod which is (as its name suggests) an equivalent (more efficient) of a groupByfollowed by mapValues:
首先Scala 2.13,大多数集合都提供了groupMap方法,该方法(顾名思义)是 a 的等效(更有效)groupBy后跟mapValues:
List("a" -> "b", "c" -> "d", "a" -> "f").groupMap(_._1)(_._2)
// Map[String,List[String]] = Map(a -> List(b, f), c -> List(d))
This:
这:
groups elements based on the first part of tuples (group part of groupMap)maps grouped values by taking their second tuple part (map part of groupMap)
groups 元素基于元组的第一部分(组Map 的组部分)map通过取它们的第二元组部S分组的值(映射组的一部分地图)
This is an equivalent of list.groupBy(_._1).mapValues(_.map(_._2))but performed in one passthrough the List.
这相当于list.groupBy(_._1).mapValues(_.map(_._2))但在一次遍历列表中执行。
回答by Melcom van Eeden
Below you can find a few solutions. (GroupBy, FoldLeft, Aggregate, Spark)
您可以在下面找到一些解决方案。(GroupBy、FoldLeft、聚合、Spark)
val list: List[(String, String)] = List(("a","b"),("c","d"),("a","f"))
GroupBy variation
按变体分组
list.groupBy(_._1).map(v => (v._1, v._2.map(_._2)))
Fold Left variation
向左折叠变化
list.foldLeft[Map[String, List[String]]](Map())((acc, value) => {
acc.get(value._1).fold(acc ++ Map(value._1 -> List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
}
})
Aggregate Variation - Similar to fold Left
聚合变化 - 类似于向左折叠
list.aggregate[Map[String, List[String]]](Map())(
(acc, value) => acc.get(value._1).fold(acc ++ Map(value._1 ->
List(value._2))){ v =>
acc ++ Map(value._1 -> (value._2 :: v))
},
(l, r) => l ++ r
)
Spark Variation - For big data sets (Conversion to a RDD and to a Plain Map from RDD)
Spark Variation - 对于大数据集(转换为 RDD 和从 RDD 转换为普通地图)
import org.apache.spark.rdd._
import org.apache.spark.{SparkContext, SparkConf}
val conf: SparkConf = new
SparkConf().setAppName("Spark").setMaster("local")
val sc: SparkContext = new SparkContext (conf)
// This gives you a rdd of the same result
val rdd: RDD[(String, List[String])] = sc.parallelize(list).combineByKey(
(value: String) => List(value),
(acc: List[String], value) => value :: acc,
(accLeft: List[String], accRight: List[String]) => accLeft ::: accRight
)
// To convert this RDD back to a Map[(String, List[String])] you can do the following
rdd.collect().toMap
回答by cevaris
Here is a more Scala idiomatic way to convert a list of tuples to a map handling duplicate keys. You want to use a fold.
这是将元组列表转换为处理重复键的映射的更 Scala 惯用方法。你想使用折叠。
val x = List("a" -> "b", "c" -> "d", "a" -> "f")
x.foldLeft(Map.empty[String, Seq[String]]) { case (acc, (k, v)) =>
acc.updated(k, acc.getOrElse(k, Seq.empty[String]) ++ Seq(v))
}
res0: scala.collection.immutable.Map[String,Seq[String]] = Map(a -> List(b, f), c -> List(d))
回答by frankfzw
You can try this
你可以试试这个
scala> val b = new Array[Int](3)
// b: Array[Int] = Array(0, 0, 0)
scala> val c = b.map(x => (x -> x * 2))
// c: Array[(Int, Int)] = Array((1,2), (2,4), (3,6))
scala> val d = Map(c : _*)
// d: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2, 2 -> 4, 3 -> 6)

