scala 在 Apache Spark 中写入文件

Question

提问by kruparulz14

I am writing a Scala code that requires me to write to a file in HDFS. When I use Filewriter.writeon local, it works. The same thing does not work on HDFS. Upon checking, I found that there are the following options to write in Apache Spark- RDD.saveAsTextFileand DataFrame.write.format.

我正在编写一个 Scala 代码，它要求我写入 HDFS 中的文件。当我Filewriter.write在本地使用时，它可以工作。同样的事情在 HDFS 上不起作用。经过检查，我发现在 Apache Spark- RDD.saveAsTextFile和DataFrame.write.format.

My question is: what if I just want to write an int or string to a file in Apache Spark?

我的问题是：如果我只想将 int 或 string 写入 Apache Spark 中的文件怎么办？

Follow up: I need to write to an output file a header, DataFrame contents and then append some string. Does sc.parallelize(Seq(<String>))help?

跟进：我需要将标题、DataFrame 内容写入输出文件，然后附加一些字符串。有sc.parallelize(Seq(<String>))帮助吗？

Answer 1

回答by Ronak Patel

create RDDwith your data (int/string) using Seq: see parallelized-collectionsfor details:

创建RDD与您的数据（INT /串）使用Seq：看到并行的集合的详细信息：

sc.parallelize(Seq(5))  //for writing int (5)
sc.parallelize(Seq("Test String")) // for writing string

val conf = new SparkConf().setAppName("Writing Int to File").setMaster("local")
val sc = new SparkContext(conf) 
val intRdd= sc.parallelize(Seq(5))   
intRdd.saveAsTextFile("out\int\test")

val conf = new SparkConf().setAppName("Writing string to File").setMaster("local")
val sc = new SparkContext(conf)   
val stringRdd = sc.parallelize(Seq("Test String"))
stringRdd.saveAsTextFile("out\string\test")

Answer 2

回答by Ronak Patel

Follow up Example:(Tested as below)

跟进示例：（测试如下）

val conf = new SparkConf().setAppName("Total Countries having Icon").setMaster("local")
val sc = new SparkContext(conf)

val headerRDD= sc.parallelize(Seq("HEADER"))

//Replace BODY part with your DF
val bodyRDD= sc.parallelize(Seq("BODY"))

val footerRDD = sc.parallelize(Seq("FOOTER"))

//combine all rdds to final    
val finalRDD = headerRDD ++ bodyRDD ++ footerRDD 

//finalRDD.foreach(line => println(line))

//output to one file
finalRDD.coalesce(1, true).saveAsTextFile("test")

output:

输出：

HEADER
BODY
FOOTER

more examples here. . .

更多例子在这里。. .

scala 在 Apache Spark 中写入文件

提问by kruparulz14

回答by Ronak Patel

回答by Ronak Patel

相关推荐

最近更新

标签

scala 在 Apache Spark 中写入文件

提问by kruparulz14

回答by Ronak Patel

回答by Ronak Patel

相关推荐

scala 删除 Spark DataFrame 的第一行

如何将 Java 流转换为 Scala 流？

scala 在 Spark 数据集中滚动你自己的 reduceByKey

如何在列数据 Spark scala 上检查 isEmpty

相关推荐

最近更新

标签