如何在 Scala 中将 DataFrame 导出到 csv?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32527519/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to export DataFrame to csv in Scala?
提问by Tong
How can I export Spark's DataFrame to csv file using Scala?
如何使用 Scala 将 Spark 的 DataFrame 导出到 csv 文件?
回答by karthik manchala
Easiest and best way to do this is to use spark-csvlibrary. You can check the documentation in the provided link and hereis the scala example of how to load and save data from/to DataFrame.
最简单和最好的方法是使用spark-csv库。您可以查看提供的链接中的文档,它here是如何从/向 DataFrame 加载和保存数据的 Scala 示例。
Code (Spark 1.4+):
代码(Spark 1.4+):
dataFrame.write.format("com.databricks.spark.csv").save("myFile.csv")
Edit:
编辑:
Spark creates part-files while saving the csv data, if you want to merge the part-files into a single csv, refer the following:
Spark在保存csv数据的同时会创建part-files,如果你想将part-files合并成一个单独的csv,参考如下:
回答by Taylrl
In Spark verions 2+you can simply use the following;
在Spark 版本 2+ 中,您可以简单地使用以下内容;
df.write.csv("/your/location/data.csv")
If you want to make sure that the files are no longer partitioned then add a .coalesce(1)as follows;
如果要确保文件不再分区,请添加.coalesce(1)如下;
df.coalesce(1).write.csv("/your/location/data.csv")
回答by Abu Shoeb
Above solution exports csv as multiple partitions. I found another solution by zero323on this stackoverflow pagethat exports a dataframe into one single CSV file when you use coalesce.
以上解决方案将 csv 导出为多个分区。我在这个stackoverflow 页面上找到了zero323 的另一个解决方案,当您使用.coalesce
df.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("/your/location/mydata")
This would create a directory named mydatawhere you'll find a csvfile that contains the results.
这将创建一个名为的目录mydata,您将在其中找到csv包含结果的文件。

