spark - scala:不是 org.apache.spark.sql.Row 的成员
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37335416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
spark - scala: not a member of org.apache.spark.sql.Row
提问by Edamame
I am trying to convert a data frame to RDD, then perform some operations below to return tuples:
我正在尝试将数据帧转换为 RDD,然后执行下面的一些操作以返回元组:
df.rdd.map { t=>
(t._2 + "_" + t._3 , t)
}.take(5)
Then I got the error below. Anyone have any ideas? Thanks!
然后我得到了下面的错误。有人有想法么?谢谢!
<console>:37: error: value _2 is not a member of org.apache.spark.sql.Row
(t._2 + "_" + t._3 , t)
^
回答by Daniel de Paula
When you convert a DataFrame to RDD, you get an RDD[Row], so when you use map, your function receives a Rowas parameter. Therefore, you must use the Rowmethods to access its members (note that the index starts from 0):
当您将 DataFrame 转换为 RDD 时,您会得到一个RDD[Row],因此当您使用 时map,您的函数会收到一个Row作为参数。因此,必须使用Row方法访问其成员(注意索引从0开始):
df.rdd.map {
row: Row => (row.getString(1) + "_" + row.getString(2), row)
}.take(5)
You can view more examples and check all methods available for Rowobjects in the Spark scaladoc.
您可以Row在Spark scaladoc 中查看更多示例并检查可用于对象的所有方法。
Edit:I don't know the reason why you are doing this operation, but for concatenating String columns of a DataFrame you may consider the following option:
编辑:我不知道您执行此操作的原因,但是为了连接 DataFrame 的 String 列,您可以考虑以下选项:
import org.apache.spark.sql.functions._
val newDF = df.withColumn("concat", concat(df("col2"), lit("_"), df("col3")))

