scala - Spark：如何在循环中联合所有数据帧

Question

提问by J.soo

Is there a way to get the dataframe that union dataframe in loop?

有没有办法在循环中获取联合数据帧的数据帧？

This is a sample code:

这是一个示例代码：

var fruits = List(
  "apple"
  ,"orange"
  ,"melon"
) 

for (x <- fruits){         
  var df = Seq(("aaa","bbb",x)).toDF("aCol","bCol","name")
}

I would want to obtain some like this:

我想获得一些这样的：

aCol | bCol | fruitsName
aaa,bbb,apple
aaa,bbb,orange
aaa,bbb,melon

Thanks again

再次感谢

Answer 1

回答by cdncat

Steffen Schmitz's answer is the most concise one I believe. Below is a more detailed answer if you are looking for more customization (of field types, etc):

Steffen Schmitz 的回答是我认为最简洁的回答。如果您正在寻找更多自定义（字段类型等），以下是更详细的答案：

import org.apache.spark.sql.types.{StructType, StructField, StringType}
import org.apache.spark.sql.Row

//initialize DF
val schema = StructType(
  StructField("aCol", StringType, true) ::
  StructField("bCol", StringType, true) ::
  StructField("name", StringType, true) :: Nil)
var initialDF = spark.createDataFrame(sc.emptyRDD[Row], schema)

//list to iterate through
var fruits = List(
    "apple"
    ,"orange"
    ,"melon"
)

for (x <- fruits) {
  //union returns a new dataset
  initialDF = initialDF.union(Seq(("aaa", "bbb", x)).toDF)
}

//initialDF.show()

references:

参考：

Answer 2

回答by Ramon

You could created a sequence of DataFrames and then use reduce:

您可以创建一个DataFrames序列，然后使用reduce：

val results = fruits.
  map(fruit => Seq(("aaa", "bbb", fruit)).toDF("aCol","bCol","name")).
  reduce(_.union(_))

results.show()

Answer 3

回答by Arun Goudar

If you have different/multiple dataframes you can use below code, which is efficient.

如果您有不同/多个数据帧，您可以使用以下代码，这很有效。

val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)

Answer 4

回答by Steffen Schmitz

In a for loop:

在 for 循环中：

val fruits = List("apple", "orange", "melon")

( for(f <- fruits) yield ("aaa", "bbb", f) ).toDF("aCol", "bCol", "name")

Answer 5

回答by Sarvesh Kumar Singh

Well... I think your question is a bit mis-guided.

嗯...我认为你的问题有点误导。

As per my limited understanding of whatever you are trying to do, you should be doing following,

根据我对您尝试做的任何事情的有限了解，您应该遵循以下步骤，

val fruits = List(
  "apple",
  "orange",
  "melon"
)

val df = fruits
  .map(x => ("aaa", "bbb", x))
  .toDF("aCol", "bCol", "name")

And this should be sufficient.

这应该就足够了。

Answer 6

回答by Rajat Mishra

you can first create a sequence and then use toDFto create Dataframe.

您可以先创建一个序列，然后使用toDF来创建Dataframe.

scala> var dseq : Seq[(String,String,String)] = Seq[(String,String,String)]()
dseq: Seq[(String, String, String)] = List()

scala> for ( x <- fruits){
     |  dseq = dseq :+ ("aaa","bbb",x)
     | }

scala> dseq
res2: Seq[(String, String, String)] = List((aaa,bbb,apple), (aaa,bbb,orange), (aaa,bbb,melon))

scala> val df = dseq.toDF("aCol","bCol","name")
df: org.apache.spark.sql.DataFrame = [aCol: string, bCol: string, name: string]

scala> df.show
+----+----+------+
|aCol|bCol|  name|
+----+----+------+
| aaa| bbb| apple|
| aaa| bbb|orange|
| aaa| bbb| melon|
+----+----+------+

scala - Spark：如何在循环中联合所有数据帧

提问by J.soo

回答by cdncat

回答by Ramon

回答by Arun Goudar

回答by Steffen Schmitz

回答by Sarvesh Kumar Singh

回答by Rajat Mishra

相关推荐

最近更新

标签

scala - Spark：如何在循环中联合所有数据帧

提问by J.soo

回答by cdncat

回答by Ramon

回答by Arun Goudar

回答by Steffen Schmitz

回答by Sarvesh Kumar Singh

回答by Rajat Mishra

相关推荐

scala 如何使用其架构从 Spark 数据框创建 hive 表？

如何在 Scala 中使用 Circe 解码 JSON 列表/数组

在 Spark/Scala 中写入 HDFS 读取 zip 文件

scala 如何通过键或过滤器（）使用带有两个 RDD 的火花交叉点（）？

相关推荐

最近更新

标签