scala Spark Group By Key to (Key,List) Pair

Question

提问by manjam

I am trying to group some data by key where the value would be a list:

我正在尝试按键对一些数据进行分组，其中值将是一个列表：

Sample data:

样本数据：

A 1
A 2
B 1
B 2

Expected result:

预期结果：

(A,(1,2))
(B,(1,2))

I am able to do this with the following code:

我可以使用以下代码执行此操作：

data.groupByKey().mapValues(List(_))

The problem is that when I then try to do a Map operation like the following:

问题是，当我尝试执行如下 Map 操作时：

groupedData.map((k,v) => (k,v(0)))

It tells me I have the wrong number of parameters.

它告诉我我的参数数量错误。

If I try:

如果我尝试：

groupedData.map(s => (s(0),s(1)))

It tells me that "(Any,List(Iterable(Any)) does not take parameters"

它告诉我“(Any,List(Iterable(Any)) 不带参数”

No clue what I am doing wrong. Is my grouping wrong? What would be a better way to do this?

不知道我做错了什么。我的分组有误吗？什么是更好的方法来做到这一点？

Scala answers only please. Thanks!!

Scala 只请回答。谢谢！！

Answer 1

回答by zero323

You're almost there. Just replace List(_)with _.toList

你快到了。只需替换List(_)为_.toList

data.groupByKey.mapValues(_.toList)

Answer 2

回答by Shadowlands

When you write an anonymous inline function of the form

当您编写表单的匿名内联函数时

ARGS => OPERATION

the entire part before the arrow (=>) is taken as the argument list. So, in the case of

将箭头 ( =>)之前的整个部分作为参数列表。所以，在这种情况下

(k, v) => ...

the interpreter takes that to mean a function that takes two arguments. In your case, however, you have a single argument which happens to be a tuple (here, a Tuple2, or a Pair- more fully, you appear to have a list of Pair[Any,List[Any]]). There are a couple of ways to get around this. First, you can use the sugared form of representing a pair, wrapped in an extra set of parentheses to show that this is the single expected argument for the function:

解释器认为这是一个带有两个参数的函数。但是，在您的情况下，您有一个恰好是元组的参数（这里， aTuple2或 a Pair- 更完整地说，您似乎有一个的列表Pair[Any,List[Any]]）。有几种方法可以解决这个问题。首先，您可以使用表示一对的加糖形式，用一组额外的括号括起来，以表明这是该函数的单个预期参数：

((x, y)) => ...

or, you can write the anonymous function in the form of a partial function that matches on tuples:

或者，您可以以匹配元组的部分函数的形式编写匿名函数：

groupedData.map( case (k,v) => (k,v(0)) )

Finally, you can simply go with a single specified argument, as per your last attempt, but - realising it is a tuple - reference the specific field(s) within the tuple that you need:

最后，您可以根据上次尝试简单地使用单个指定参数，但是 - 意识到它是一个元组 - 引用您需要的元组中的特定字段：

groupedData.map(s => (s._2(0),s._2(1)))  // The key is s._1, and the value list is s._2

scala Spark Group By Key to (Key,List) Pair

提问by manjam

回答by zero323

回答by Shadowlands

相关推荐

最近更新

标签

scala Spark Group By Key to (Key,List) Pair

提问by manjam

回答by zero323

回答by Shadowlands

相关推荐

scala Apache Spark 中的矩阵乘法

将字符串转换为枚举值的 Scala 安全方法

scala 有没有办法使用scala过滤火花数据框中不包含某些内容的字段？

scala 使用镶木地板文件元数据创建配置单元表

相关推荐

最近更新

标签