scala 如何将 Array[Row] 转换为 DataFrame

Question

提问by Garipaso

How do I convert this one row to a dataframe?

如何将这一行转换为数据帧？

val oneRowDF = myDF.first // gives Array[Row]

Thanks

谢谢

Answer 1

回答by T. Gaw?da

In my answer, df1 is a DataFrame [text: string, y : int], just for testing - val df1 = sc.parallelize(List("a", 1")).toDF("text", "y").

在我的回答中，df1 是一个 DataFrame [text: string, y : int]，仅用于测试 - val df1 = sc.parallelize(List("a", 1")).toDF("text", "y")。

val schema = StructType(
    StructField("text", StringType, false) ::
    StructField("y", IntegerType, false) :: Nil)
val arr = df1.head(3); // Array[Row]
val dfFromArray = sqlContext.createDataFrame(sparkContext.parallelize(arr), schema);

You can also map parallelized array and cast every row:

您还可以映射并行化数组并转换每一行：

val dfFromArray = sparkContext.parallelize(arr).map(row => (row.getString(0), row.getInt(1)))
    .toDF("text", "y");

In case of one row, you can run:

在一行的情况下，您可以运行：

val dfFromArray = sparkContext.parallelize(Seq(row)).map(row => (row.getString(0), row.getInt(1)))
    .toDF("text", "y");

In Spark 2.0 use SparkSession instead of SQLContext.

在 Spark 2.0 中使用 SparkSession 而不是 SQLContext。

Answer 2

回答by Shiv4nsh

You do not want to do that :

你不想这样做：

If you want a subpart of the whole dataFrame just use limitapi.

如果您想要整个数据帧的子部分，请使用limitapi。

Example:

例子：

scala> val d=sc.parallelize(Seq((1,3),(2,4))).toDF
d: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

scala> d.show
+---+---+
| _1| _2|
+---+---+
|  1|  3|
|  2|  4|
+---+---+


scala> d.limit(1)
res1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: int, _2: int]

scala> d.limit(1).show
+---+---+
| _1| _2|
+---+---+
|  1|  3|
+---+---+

Still if you want to explicitly convert an Array[Row] to DataFrame , you can do something like

尽管如此，如果您想将 Array[Row] 显式转换为 DataFrame ，您可以执行以下操作

scala> val value=d.take(1)
value: Array[org.apache.spark.sql.Row] = Array([1,3])

scala> val asTuple=value.map(a=>(a.getInt(0),a.getInt(1)))
asTuple: Array[(Int, Int)] = Array((1,3))

scala> sc.parallelize(asTuple).toDF
res6: org.apache.spark.sql.DataFrame = [_1: int, _2: int]

And hence now you can show it accordingly !

因此现在您可以相应地显示它！

Answer 3

回答by Arun Y

If you have List<Row>, then it can directly be used to create a dataframeor dataset<Row>using spark.createDataFrame(List<Row> rows, StructType schema). Where spark is SparkSession in spark 2.x

如果有List<Row>，那么它可以直接用于创建dataframe或dataset<Row>使用spark.createDataFrame(List<Row> rows, StructType schema)。spark 2.x 中的 spark 是 SparkSession

Answer 4

回答by Reactormonk

Take a look at the scaladocs- I'd recommend RDD[Row]here, which means you need to get there. Should be easiest with makeRDD. You'll also need a schema corresponding to your Row, Which you can directly pull from it.

看看scaladocs- 我推荐RDD[Row]这里，这意味着你需要到达那里。使用makeRDD应该是最简单的。您还需要一个与您的对应的架构Row，您可以直接从中提取。

... how did you get Array[Row]in the first place?

……你Array[Row]一开始是怎么来的？

scala 如何将 Array[Row] 转换为 DataFrame

提问by Garipaso

回答by T. Gaw?da

回答by Shiv4nsh

回答by Arun Y

回答by Reactormonk

相关推荐

最近更新

标签

scala 如何将 Array[Row] 转换为 DataFrame

提问by Garipaso

回答by T. Gaw?da

回答by Shiv4nsh

回答by Arun Y

回答by Reactormonk

相关推荐

scala Spark SQL - IN 子句

scala 尝试从 Artifactory 虚拟存储库下载时，SBT 无法找到凭据

scala 如何将 Spark RDD 保存到本地文件系统

Spark 和 Scala 中数据框的转换模式

相关推荐

最近更新

标签