在 Scala 中,我如何做等效于 SQL SUM 和 GROUP BY 的操作?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7142514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In Scala, how can I do the equivalent of an SQL SUM and GROUP BY?
提问by deltanovember
For example, suppose I have
例如,假设我有
val list: List[(String, Double)]
with values
有价值观
"04-03-1985", 1.5
"05-03-1985", 2.4
"05-03-1985", 1.3
How could I produce a new List
我怎么能产生一个新的列表
"04-03-1985", 1.5
"05-03-1985", 3.7
回答by Kipton Barros
Here's a one-liner. It's not particularly readable, unless one really internalizes the types of these higher order functions.
这是一个单线。它不是特别易读,除非人们真正内化了这些高阶函数的类型。
val s = Seq(("04-03-1985" -> 1.5),
("05-03-1985" -> 2.4),
("05-03-1985" -> 1.3))
s.groupBy(_._1).mapValues(_.map(_._2).sum)
// returns: Map(04-03-1985 -> 1.5, 05-03-1985 -> 3.7)
Another approach is to add the key-value pairs one-by-one using fold,
另一种方法是使用 fold 逐个添加键值对,
s.foldLeft(Map[String, Double]()) { case (m, (k, v)) =>
m + (k -> (v + m.getOrElse(k, 0d)))
}
The equivalent for comprehension is most accessible, in my opinion,
在我看来,理解的等价物是最容易获得的,
var m = Map[String, Double]()
for ((k, v) <- s) {
m += k -> (v + m.getOrElse(k, 0d))
}
Maybe something nicer can be done with Scalaz's monoid typeclass for Map.
也许使用 Scalaz 的用于 Map 的 monoid 类型类可以做一些更好的事情。
Note that you can convert between Map[K, V]and Seq[(K, V)]using the toSeqand toMapmethods.
请注意,您可以在和方法之间进行转换Map[K, V]和Seq[(K, V)]使用。toSeqtoMap
Update. After pondering it some more, I think the natural abstraction would be a "multimap" conversion, of type,
更新。经过更多的思考,我认为自然抽象将是一个“multimap”转换,类型,
def seqToMultimap[A, B](s: Seq[(A, B)]): Map[A, Seq[B]]
With the appropriate implicit extension in one's personal library, one could then write:
通过在个人图书馆中适当的隐式扩展,人们可以这样写:
s.toMultimap.mapValues(_.sum)
This is the clearest of all, in my opinion!
在我看来,这是最清楚的!
回答by Eric
There is another possibility using Scalaz.
还有另一种可能使用 Scalaz。
The key point is to notice that, if Mis a Monoid, then Map[T, M]is also a Monoid. This means that if I have 2 maps, m1and m2I can add them so that, for each similar key, the elements will be added together.
关键是要注意,如果M是 a Monoid,那么Map[T, M]也是 a Monoid。这意味着,如果我有2个地图,m1而且m2我可以这么加他们说,对于每一个类似的按键,该元素将被加在一起。
For example, Map[String, List[String]]is a Monoid because List[String]is a Monoid. So given the appropriate Monoidinstance in scope, I should be able to do:
例如,Map[String, List[String]]是 Monoid 因为List[String]是Monoid。因此,鉴于Monoid范围内的适当实例,我应该能够做到:
val m1 = Map("a" -> List(1), "b" -> List(3))
val m2 = Map("a" -> List(2))
// |+| "adds" two elements of a Monoid together in Scalaz
m1 |+| m2 === Map("a" -> List(1, 2), "b" -> List(3))
For your question we can see that Map[String, Int]is a Monoidbecause there is a Monoidinstance for the Inttype. Let's import it:
对于您的问题,我们可以看到Map[String, Int]是 a,Monoid因为Monoid该Int类型有一个实例。让我们导入它:
implicit val mapMonoid = MapMonoid[String, Int]
Then, I need a function reduceMonoid, which takes anything that's Traversableand "adds" its elements with a Monoid. I just write the reduceMonoiddefinition here, for the full implementation, please refer to my post on the Essence of the Iterator Pattern:
然后,我需要一个 function reduceMonoid,它接受任何东西Traversable并用Monoid. 我只是在reduceMonoid这里写了定义,完整的实现请参考我的文章迭代器模式的本质:
// T is a "Traversable"
def reduce[A, M : Monoid](reducer: A => M): T[A] => M
Those 2 definitions do not exist in the current Scalaz library but they are not difficult to add (based on the existing Monoidand Traversetypeclasses). And once we have them, the solution to your question is very straightforward:
这 2 个定义在当前的 Scalaz 库中不存在,但它们不难添加(基于现有Monoid和Traverse类型类)。一旦我们有了它们,您的问题的解决方案就非常简单:
val s = Seq(("04-03-1985" -> 1.5),
("05-03-1985" -> 2.4),
("05-03-1985" -> 1.3))
// we just put each pair in its own map and we let the Monoid instance
// "add" the maps together
s.reduceMonoid(Map(_)) === Map("04-03-1985" -> 1.5,
"05-03-1985" -> 3.7)
If you feel that the code above is a bit obscure (but really concise, right?), I encourage you to check the github project for the EIP postand play with it. One exampleshows the solution to your question:
如果你觉得上面的代码有点晦涩(但真的很简洁,对吧?),我鼓励你检查github 项目中的 EIP 帖子并使用它。一个示例显示了您的问题的解决方案:
"I can build a map String->Int" >> {
val map1 = List("a" -> 1, "a" -> 2, "b" -> 3, "c" -> 4, "b" -> 5)
implicit val mapMonoid = MapMonoid[String, Int]
map1.reduceMonoid(Map(_)) must_== Map("a" -> 3, "b" -> 8, "c" -> 4)
}
回答by huynhjl
I used that pattern s.groupBy(_._1).mapValues(_.map(_._2).sum)from Kipton's answer all the time. It translates pretty directly my thought process but unfortunately isn't always easy to read. I've found that using case class whenever possible makes things a bit better:
我一直使用s.groupBy(_._1).mapValues(_.map(_._2).sum)Kipton 的回答中的那种模式。它非常直接地翻译了我的思维过程,但不幸的是并不总是容易阅读。我发现尽可能使用 case 类会使事情变得更好:
case class Data(date: String, amount: Double)
val t = s.map(t => (Data.apply _).tupled(t))
// List(Data(04-03-1985,1.5), Data(05-03-1985,2.4), Data(05-03-1985,1.3))
It then becomes:
然后变成:
t.groupBy(_.date).mapValues{ group => group.map(_.amount).sum }
// Map(04-03-1985-> 1.5, 05-03-1985 -> 3.7)
I think it is then more readable than the foldor forversion.
我觉得它是那么不是更可读倍或为版本。
回答by anrizal - Anwar Rizal
val s = List ( "04-03-1985" -> 1.5, "05-03-1985" -> 2.4, "05-03-1985" -> 1.3)
for { (key, xs) <- s.groupBy(_._1)
x = xs.map(_._2).sum
} yield (key, x)
回答by Xavier Guihot
Starting Scala 2.13, you can use the groupMapReducemethod which is (as its name suggests) an equivalent of a groupByfollowed by mapValuesand a reducestep:
开始Scala 2.13,您可以使用该groupMapReduce方法(顾名思义)相当于 agroupBy后跟mapValues和reduce步骤:
// val l = List(("04-03-1985", 1.5), ("05-03-1985", 2.4), ("05-03-1985", 1.3))
l.groupMapReduce(_._1)(_._2)(_ + _).toList
// List(("04-03-1985", 1.5), ("05-03-1985", 3.7))
This:
这:
groups tuples by their first part (_._1) (group part of groupMapReduce)maps each grouped tuples to their second part (_._2) (map part of groupMapReduce)reduces values within each group (_ + _) by summing them (reduce part of groupMapReduce).
groups 元组的第一部分 (_._1) (组MapReduce 的组部分)maps 每个分组的元组到它们的第二部分 (_._2) (组MapReduce 的映射部分)reduce每个组 (_ + _) 中的s 值通过对它们求和(减少 groupMap Reduce 的一部分)。
This is a one-pass versionof what can be translated by:
这是可以通过以下方式翻译的内容的一次性版本:
l.groupBy(_._1).mapValues(_.map(_._2).reduce(_ + _)).toList

