scala 从单个字符串创建 Spark DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39963495/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:43:18  来源:igfitidea点击:

Creating a Spark DataFrame from a single string

scalaapache-sparkspark-dataframe

提问by smeeb

I'm trying to take a hardcoded String and turn it into a 1-row Spark DataFrame (with a single column of type StringType) such that:

我正在尝试采用硬编码字符串并将其转换为 1 行 Spark DataFrame(具有单个类型的列StringType),以便:

String fizz = "buzz"

Would result with a DataFrame whose .show()method looks like:

将导致 DataFrame 的.show()方法如下所示:

+-----+
| fizz|
+-----+
| buzz|
+-----+

My best attempt thus far has been:

到目前为止,我最好的尝试是:

val rawData = List("fizz")
val df = sqlContext.sparkContext.parallelize(Seq(rawData)).toDF()

df.show()

But I get the following compiler error:

但我收到以下编译器错误:

java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:413)
    at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:155)

Any ideas as to where I'm going awry? Also, how do I set "buzz"as the row value for the fizzcolumn?

关于我要去哪里的任何想法?另外,如何设置"buzz"为列的行值fizz



Update:

更新:

Trying:

试:

sqlContext.sparkContext.parallelize(rawData).toDF()

I get a DF that looks like:

我得到一个看起来像的 DF:

+----+
|  _1|
+----+
|buzz|
+----+

回答by

Try:

尝试:

sqlContext.sparkContext.parallelize(rawData).toDF()

In 2.0 you can:

在 2.0 中,您可以:

import spark.implicits._

rawData.toDF

Optionally provide a sequence of names for toDF:

(可选)为 提供一系列名称toDF

sqlContext.sparkContext.parallelize(rawData).toDF("fizz")