scala 在 Apache Spark 中写入文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39173039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Writing to a file in Apache Spark
提问by kruparulz14
I am writing a Scala code that requires me to write to a file in HDFS.
When I use Filewriter.writeon local, it works. The same thing does not work on HDFS.
Upon checking, I found that there are the following options to write in Apache Spark-
RDD.saveAsTextFileand DataFrame.write.format.
我正在编写一个 Scala 代码,它要求我写入 HDFS 中的文件。当我Filewriter.write在本地使用时,它可以工作。同样的事情在 HDFS 上不起作用。经过检查,我发现在 Apache Spark-
RDD.saveAsTextFile和DataFrame.write.format.
My question is: what if I just want to write an int or string to a file in Apache Spark?
我的问题是:如果我只想将 int 或 string 写入 Apache Spark 中的文件怎么办?
Follow up:
I need to write to an output file a header, DataFrame contents and then append some string.
Does sc.parallelize(Seq(<String>))help?
跟进:我需要将标题、DataFrame 内容写入输出文件,然后附加一些字符串。有sc.parallelize(Seq(<String>))帮助吗?
回答by Ronak Patel
create RDDwith your data (int/string) using Seq: see parallelized-collectionsfor details:
创建RDD与您的数据(INT /串)使用Seq:看到并行的集合的详细信息:
sc.parallelize(Seq(5)) //for writing int (5)
sc.parallelize(Seq("Test String")) // for writing string
val conf = new SparkConf().setAppName("Writing Int to File").setMaster("local")
val sc = new SparkContext(conf)
val intRdd= sc.parallelize(Seq(5))
intRdd.saveAsTextFile("out\int\test")
val conf = new SparkConf().setAppName("Writing string to File").setMaster("local")
val sc = new SparkContext(conf)
val stringRdd = sc.parallelize(Seq("Test String"))
stringRdd.saveAsTextFile("out\string\test")
回答by Ronak Patel
Follow up Example:(Tested as below)
跟进示例:(测试如下)
val conf = new SparkConf().setAppName("Total Countries having Icon").setMaster("local")
val sc = new SparkContext(conf)
val headerRDD= sc.parallelize(Seq("HEADER"))
//Replace BODY part with your DF
val bodyRDD= sc.parallelize(Seq("BODY"))
val footerRDD = sc.parallelize(Seq("FOOTER"))
//combine all rdds to final
val finalRDD = headerRDD ++ bodyRDD ++ footerRDD
//finalRDD.foreach(line => println(line))
//output to one file
finalRDD.coalesce(1, true).saveAsTextFile("test")
output:
输出:
HEADER
BODY
FOOTER

