scala - Spark:如何在循环中联合所有数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/43489807/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
scala - Spark : How to union all dataframe in loop
提问by J.soo
Is there a way to get the dataframe that union dataframe in loop?
有没有办法在循环中获取联合数据帧的数据帧?
This is a sample code:
这是一个示例代码:
var fruits = List(
  "apple"
  ,"orange"
  ,"melon"
) 
for (x <- fruits){         
  var df = Seq(("aaa","bbb",x)).toDF("aCol","bCol","name")
}
I would want to obtain some like this:
我想获得一些这样的:
aCol | bCol | fruitsName
aaa,bbb,apple
aaa,bbb,orange
aaa,bbb,melon
Thanks again
再次感谢
回答by cdncat
Steffen Schmitz's answer is the most concise one I believe. Below is a more detailed answer if you are looking for more customization (of field types, etc):
Steffen Schmitz 的回答是我认为最简洁的回答。如果您正在寻找更多自定义(字段类型等),以下是更详细的答案:
import org.apache.spark.sql.types.{StructType, StructField, StringType}
import org.apache.spark.sql.Row
//initialize DF
val schema = StructType(
  StructField("aCol", StringType, true) ::
  StructField("bCol", StringType, true) ::
  StructField("name", StringType, true) :: Nil)
var initialDF = spark.createDataFrame(sc.emptyRDD[Row], schema)
//list to iterate through
var fruits = List(
    "apple"
    ,"orange"
    ,"melon"
)
for (x <- fruits) {
  //union returns a new dataset
  initialDF = initialDF.union(Seq(("aaa", "bbb", x)).toDF)
}
//initialDF.show()
references:
参考:
回答by Ramon
You could created a sequence of DataFrames and then use reduce:
您可以创建一个DataFrames序列,然后使用reduce:
val results = fruits.
  map(fruit => Seq(("aaa", "bbb", fruit)).toDF("aCol","bCol","name")).
  reduce(_.union(_))
results.show()
回答by Arun Goudar
If you have different/multiple dataframes you can use below code, which is efficient.
如果您有不同/多个数据帧,您可以使用以下代码,这很有效。
val newDFs = Seq(DF1,DF2,DF3)
newDFs.reduce(_ union _)
回答by Steffen Schmitz
In a for loop:
在 for 循环中:
val fruits = List("apple", "orange", "melon")
( for(f <- fruits) yield ("aaa", "bbb", f) ).toDF("aCol", "bCol", "name")
回答by Sarvesh Kumar Singh
Well... I think your question is a bit mis-guided.
嗯...我认为你的问题有点误导。
As per my limited understanding of whatever you are trying to do, you should be doing following,
根据我对您尝试做的任何事情的有限了解,您应该遵循以下步骤,
val fruits = List(
  "apple",
  "orange",
  "melon"
)
val df = fruits
  .map(x => ("aaa", "bbb", x))
  .toDF("aCol", "bCol", "name")
And this should be sufficient.
这应该就足够了。
回答by Rajat Mishra
you can first create a sequence and then use toDFto create Dataframe.
您可以先创建一个序列,然后使用toDF来创建Dataframe.
scala> var dseq : Seq[(String,String,String)] = Seq[(String,String,String)]()
dseq: Seq[(String, String, String)] = List()
scala> for ( x <- fruits){
     |  dseq = dseq :+ ("aaa","bbb",x)
     | }
scala> dseq
res2: Seq[(String, String, String)] = List((aaa,bbb,apple), (aaa,bbb,orange), (aaa,bbb,melon))
scala> val df = dseq.toDF("aCol","bCol","name")
df: org.apache.spark.sql.DataFrame = [aCol: string, bCol: string, name: string]
scala> df.show
+----+----+------+
|aCol|bCol|  name|
+----+----+------+
| aaa| bbb| apple|
| aaa| bbb|orange|
| aaa| bbb| melon|
+----+----+------+

