在 Scala 中，我如何做等效于 SQL SUM 和 GROUP BY 的操作？

Question

提问by deltanovember

For example, suppose I have

例如，假设我有

val list: List[(String, Double)]

with values

有价值观

"04-03-1985", 1.5
"05-03-1985", 2.4
"05-03-1985", 1.3

How could I produce a new List

我怎么能产生一个新的列表

"04-03-1985", 1.5
"05-03-1985", 3.7

Answer 1

回答by Kipton Barros

Here's a one-liner. It's not particularly readable, unless one really internalizes the types of these higher order functions.

这是一个单线。它不是特别易读，除非人们真正内化了这些高阶函数的类型。

val s = Seq(("04-03-1985" -> 1.5),
            ("05-03-1985" -> 2.4),
            ("05-03-1985" -> 1.3))

s.groupBy(_._1).mapValues(_.map(_._2).sum)
// returns: Map(04-03-1985 -> 1.5, 05-03-1985 -> 3.7)

Another approach is to add the key-value pairs one-by-one using fold,

另一种方法是使用 fold 逐个添加键值对，

s.foldLeft(Map[String, Double]()) { case (m, (k, v)) =>
  m + (k -> (v + m.getOrElse(k, 0d)))
}

The equivalent for comprehension is most accessible, in my opinion,

在我看来，理解的等价物是最容易获得的，

var m = Map[String, Double]()
for ((k, v) <- s) {
  m += k -> (v + m.getOrElse(k, 0d))
}

Maybe something nicer can be done with Scalaz's monoid typeclass for Map.

也许使用 Scalaz 的用于 Map 的 monoid 类型类可以做一些更好的事情。

Note that you can convert between Map[K, V]and Seq[(K, V)]using the toSeqand toMapmethods.

请注意，您可以在和方法之间进行转换Map[K, V]和Seq[(K, V)]使用。toSeqtoMap

Update. After pondering it some more, I think the natural abstraction would be a "multimap" conversion, of type,

更新。经过更多的思考，我认为自然抽象将是一个“multimap”转换，类型，

def seqToMultimap[A, B](s: Seq[(A, B)]): Map[A, Seq[B]]

With the appropriate implicit extension in one's personal library, one could then write:

通过在个人图书馆中适当的隐式扩展，人们可以这样写：

s.toMultimap.mapValues(_.sum)

This is the clearest of all, in my opinion!

在我看来，这是最清楚的！

Answer 2

回答by Eric

There is another possibility using Scalaz.

还有另一种可能使用 Scalaz。

The key point is to notice that, if Mis a Monoid, then Map[T, M]is also a Monoid. This means that if I have 2 maps, m1and m2I can add them so that, for each similar key, the elements will be added together.

关键是要注意，如果M是 a Monoid，那么Map[T, M]也是 a Monoid。这意味着，如果我有2个地图，m1而且m2我可以这么加他们说，对于每一个类似的按键，该元素将被加在一起。

For example, Map[String, List[String]]is a Monoid because List[String]is a Monoid. So given the appropriate Monoidinstance in scope, I should be able to do:

例如，Map[String, List[String]]是 Monoid 因为List[String]是Monoid。因此，鉴于Monoid范围内的适当实例，我应该能够做到：

  val m1 = Map("a" -> List(1), "b" -> List(3))
  val m2 = Map("a" -> List(2))

  // |+| "adds" two elements of a Monoid together in Scalaz
  m1 |+| m2 === Map("a" -> List(1, 2), "b" -> List(3))

For your question we can see that Map[String, Int]is a Monoidbecause there is a Monoidinstance for the Inttype. Let's import it:

对于您的问题，我们可以看到Map[String, Int]是 a，Monoid因为Monoid该Int类型有一个实例。让我们导入它：

  implicit val mapMonoid = MapMonoid[String, Int]

Then, I need a function reduceMonoid, which takes anything that's Traversableand "adds" its elements with a Monoid. I just write the reduceMonoiddefinition here, for the full implementation, please refer to my post on the Essence of the Iterator Pattern:

然后，我需要一个 function reduceMonoid，它接受任何东西Traversable并用Monoid. 我只是在reduceMonoid这里写了定义，完整的实现请参考我的文章迭代器模式的本质：

  // T is a "Traversable"
  def reduce[A, M : Monoid](reducer: A => M): T[A] => M

Those 2 definitions do not exist in the current Scalaz library but they are not difficult to add (based on the existing Monoidand Traversetypeclasses). And once we have them, the solution to your question is very straightforward:

这 2 个定义在当前的 Scalaz 库中不存在，但它们不难添加（基于现有Monoid和Traverse类型类）。一旦我们有了它们，您的问题的解决方案就非常简单：

  val s = Seq(("04-03-1985" -> 1.5),
              ("05-03-1985" -> 2.4),
              ("05-03-1985" -> 1.3))

   // we just put each pair in its own map and we let the Monoid instance
   // "add" the maps together
   s.reduceMonoid(Map(_)) === Map("04-03-1985" -> 1.5,
                                  "05-03-1985" -> 3.7)

If you feel that the code above is a bit obscure (but really concise, right?), I encourage you to check the github project for the EIP postand play with it. One exampleshows the solution to your question:

如果你觉得上面的代码有点晦涩（但真的很简洁，对吧？），我鼓励你检查github 项目中的 EIP 帖子并使用它。一个示例显示了您的问题的解决方案：

   "I can build a map String->Int" >> {
     val map1 = List("a" -> 1, "a" -> 2, "b" -> 3, "c" -> 4, "b" -> 5)
     implicit val mapMonoid = MapMonoid[String, Int]

     map1.reduceMonoid(Map(_)) must_== Map("a" -> 3, "b" -> 8, "c" -> 4)
   }

Answer 3

回答by huynhjl

I used that pattern s.groupBy(_._1).mapValues(_.map(_._2).sum)from Kipton's answer all the time. It translates pretty directly my thought process but unfortunately isn't always easy to read. I've found that using case class whenever possible makes things a bit better:

我一直使用s.groupBy(_._1).mapValues(_.map(_._2).sum)Kipton 的回答中的那种模式。它非常直接地翻译了我的思维过程，但不幸的是并不总是容易阅读。我发现尽可能使用 case 类会使事情变得更好：

case class Data(date: String, amount: Double)
val t = s.map(t => (Data.apply _).tupled(t))
// List(Data(04-03-1985,1.5), Data(05-03-1985,2.4), Data(05-03-1985,1.3))

It then becomes:

然后变成：

t.groupBy(_.date).mapValues{ group => group.map(_.amount).sum }
// Map(04-03-1985-> 1.5, 05-03-1985 -> 3.7)

I think it is then more readable than the foldor forversion.

我觉得它是那么不是更可读倍或为版本。

Answer 4

回答by anrizal - Anwar Rizal

val s = List ( "04-03-1985" -> 1.5, "05-03-1985" -> 2.4, "05-03-1985" -> 1.3)
for { (key, xs) <- s.groupBy(_._1)
       x = xs.map(_._2).sum
    } yield (key, x)

Answer 5

回答by Xavier Guihot

Starting Scala 2.13, you can use the groupMapReducemethod which is (as its name suggests) an equivalent of a groupByfollowed by mapValuesand a reducestep:

开始Scala 2.13，您可以使用该groupMapReduce方法（顾名思义）相当于 agroupBy后跟mapValues和reduce步骤：

// val l = List(("04-03-1985", 1.5), ("05-03-1985", 2.4), ("05-03-1985", 1.3))
l.groupMapReduce(_._1)(_._2)(_ + _).toList
// List(("04-03-1985", 1.5), ("05-03-1985", 3.7))

This:

这：

groups tuples by their first part (_._1) (group part of groupMapReduce)
maps each grouped tuples to their second part (_._2) (map part of groupMapReduce)
reduces values within each group (_ + _) by summing them (reduce part of groupMapReduce).

groups 元组的第一部分 ( _._1) （组MapReduce 的组部分）
maps 每个分组的元组到它们的第二部分 ( _._2) （组MapReduce 的映射部分）
reduce每个组 ( _ + _) 中的s 值通过对它们求和（减少 groupMap Reduce 的一部分）。

This is a one-pass versionof what can be translated by:

这是可以通过以下方式翻译的内容的一次性版本：

l.groupBy(_._1).mapValues(_.map(_._2).reduce(_ + _)).toList

在 Scala 中，我如何做等效于 SQL SUM 和 GROUP BY 的操作？

提问by deltanovember

回答by Kipton Barros

回答by Eric

回答by huynhjl

回答by anrizal - Anwar Rizal

回答by Xavier Guihot

相关推荐

最近更新

标签

在 Scala 中，我如何做等效于 SQL SUM 和 GROUP BY 的操作？

提问by deltanovember

回答by Kipton Barros

回答by Eric

回答by huynhjl

回答by anrizal - Anwar Rizal

回答by Xavier Guihot

相关推荐

如何附加到 Scala 中的文件？

scala 没有参数的Scala构造函数

Scala：在一个语句中将字符串写入文件

为什么 Scala 的元组语法如此不同寻常？

相关推荐

最近更新

标签