如何在 spark scala 中使用自定义分隔符(ctrl-A 分隔)文件编写数据帧/RDD?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48077756/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you write a dataframe/RDD with custom delimeiter (ctrl-A delimited) file in spark scala?
提问by Amit
I am working over poc in which I need to create dataframe and then save it as ctrl A delimited file. My query to create intermediate result is below
我正在处理 poc,我需要在其中创建数据帧,然后将其另存为 ctrl 分隔文件。我创建中间结果的查询如下
val grouped = results.groupBy("club_data","student_id_add","student_id").agg(sum(results("amount").cast(IntegerType)).as("amount"),count("amount").as("cnt")).filter((length(trim($"student_id")) > 1) && ($"student_id").isNotNull)
Saving result in text file
将结果保存在文本文件中
grouped.select($"club_data", $"student_id_add", $"amount",$"cnt").rdd.saveAsTextFile("/amit/spark/output4/")
Output :
输出 :
[amit,DI^A356035,581,1]
It saves data as comma separated but I need to save it as ctrl-A separate I tried option("delimiter", "\u0001") but seems it's not supported by dataframe/rdd.
它将数据保存为逗号分隔,但我需要将其另存为 ctrl-A 单独我试过 option("delimiter", "\u0001") 但似乎数据帧/rdd不支持它。
Is there any function which helps?
有什么功能可以帮助吗?
回答by ktheitroadalo
If you have a dataframe you can use Spark-CSV to write as a csv with delimiter as below.
如果你有一个数据框,你可以使用 Spark-CSV 写成一个带有分隔符的 csv,如下所示。
df.write.mode(SaveMode.Overwrite).option("delimiter", "\u0001").csv("outputCSV")
With Older version of Spark
使用旧版本的 Spark
df.write
.format("com.databricks.spark.csv")
.option("delimiter", "\u0001")
.mode(SaveMode.Overwrite)
.save("outputCSV")
You can read back as below
您可以阅读如下
spark.read.option("delimiter", "\u0001").csv("outputCSV").show()
IF you have an RDD than you can use mkString()function on RDDand save with saveAsTextFile()
如果你有一个 RDD,那么你可以使用mkString()函数RDD并保存saveAsTextFile()
rdd.map(r => r.mkString(\u0001")).saveAsTextFile("outputCSV")
Hope this helps!
希望这可以帮助!
回答by Ishan Kumar
df.rdd.map(x=>x.mkString("^A")).saveAsTextFile("file:/home/iot/data/stackOver")
回答by Arnon Rotem-Gal-Oz
convert the rows to text before saving:
保存前将行转换为文本:
grouped.select($"club_data", $"student_id_add", $"amount",$"cnt").map(row => row.mkString(\u0001")).saveAsTextFile("/amit/spark/output4/")

![scala 将火花数据帧转换为数组 [String]](/res/img/loading.gif)