Spark 2.0 Scala - RDD.toDF()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38968351/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:33:23  来源:igfitidea点击:

Spark 2.0 Scala - RDD.toDF()

scalaapache-spark

提问by Carl

I am working with Spark 2.0 Scala. I am able to convert an RDD to a DataFrame using the toDF() method.

我正在使用 Spark 2.0 Scala。我能够使用 toDF() 方法将 RDD 转换为 DataFrame。

val rdd = sc.textFile("/pathtologfile/logfile.txt")
val df = rdd.toDF()

But for the life of me I cannot find where this is in the API docs. It is not under RDD. But it is under DataSet (link 1). However I have an RDD not a DataSet.

但是在我的一生中,我无法在 API 文档中找到它的位置。它不在 RDD 下。但它在 DataSet 下(链接 1)。但是我有一个 RDD 而不是 DataSet。

Also I can't see it under implicits (link 2).

我也看不到它隐含(链接 2)。

So please help me understand why toDF() can be called for my RDD. Where is this method being inherited from?

所以请帮助我理解为什么可以为我的 RDD 调用 toDF()。这个方法是从哪里继承来的?

采纳答案by Raphael Roth

It's coming from here:

它来自这里:

Spark 2 API

火花 2 API

Explanation: if you import sqlContext.implicits._, you have a implicit method to convert RDDto DataSetHolder(rddToDataSetHolder), then you call toDFon the DataSetHolder

说明:如果导入sqlContext.implicits._,你要转换隐式方法RDD,以DataSetHolderrddToDataSetHolder),然后调用toDFDataSetHolder

回答by DanielVL

Yes, you should import sqlContext implicits like that:

是的,您应该像这样导入 sqlContext 隐式:

val sqlContext = //create sqlContext

import sqlContext.implicits._

val df = RDD.toDF()

Before you call to "toDF" in your RDDs

在 RDD 中调用“toDF”之前

回答by user3749126

Yes I finally found piece of mind, this issue. It was troubling me like hell, this post is a life saver. I was trying to generically load data from log files to a case class object making it mutable List, this idea was to finally convert the list into DF. However as it was mutable and Spark 2.1.1 has changed the toDF implementation, what ever why the list want not getting converted. I finally thought of even covering save the data to file and the load it back using .read. However 5 min back this post had saved my day.

是的,我终于找到了主意,这个问题。这让我很困扰,这篇文章是救命稻草。我试图将日志文件中的数据一般加载到案例类对象,使其成为可变列表,这个想法是最终将列表转换为 DF。然而,由于它是可变的并且 Spark 2.1.1 已经改变了 toDF 实现,所以为什么列表不想被转换。我终于想到甚至覆盖将数据保存到文件并使用 .read 将其加载回来。然而 5 分钟后,这篇文章拯救了我的一天。

I did the exact same way as described.

我做了与描述完全相同的方式。

after loading the data to mutable list I immediately used

将数据加载到可变列表后,我立即使用

import spark.sqlContext.implicits._
val df = <mutable list object>.toDF 
df.show()

回答by Gautam De

I have done just this with Spark 2. it worked.

我已经用 Spark 2 做到了这一点。它奏效了。

val orders = sc.textFile("/user/gd/orders")
val ordersDF = orders.toDF()