如何在 Scala 中将 DataFrame 导出到 csv？

Question

提问by Tong

How can I export Spark's DataFrame to csv file using Scala?

如何使用 Scala 将 Spark 的 DataFrame 导出到 csv 文件？

Answer 1

回答by karthik manchala

Easiest and best way to do this is to use spark-csvlibrary. You can check the documentation in the provided link and hereis the scala example of how to load and save data from/to DataFrame.

最简单和最好的方法是使用spark-csv库。您可以查看提供的链接中的文档，它here是如何从/向 DataFrame 加载和保存数据的 Scala 示例。

Code (Spark 1.4+):

代码（Spark 1.4+）：

dataFrame.write.format("com.databricks.spark.csv").save("myFile.csv")

Edit:

编辑：

Spark creates part-files while saving the csv data, if you want to merge the part-files into a single csv, refer the following:

Spark在保存csv数据的同时会创建part-files，如果你想将part-files合并成一个单独的csv，参考如下：

Merge Spark's CSV output folder to Single File

将 Spark 的 CSV 输出文件夹合并到单个文件

Answer 2

回答by Taylrl

In Spark verions 2+you can simply use the following;

在Spark 版本 2+ 中，您可以简单地使用以下内容；

df.write.csv("/your/location/data.csv")

If you want to make sure that the files are no longer partitioned then add a .coalesce(1)as follows;

如果要确保文件不再分区，请添加.coalesce(1)如下；

df.coalesce(1).write.csv("/your/location/data.csv")

Answer 3

回答by Abu Shoeb

Above solution exports csv as multiple partitions. I found another solution by zero323on this stackoverflow pagethat exports a dataframe into one single CSV file when you use coalesce.

以上解决方案将 csv 导出为多个分区。我在这个stackoverflow 页面上找到了zero323 的另一个解决方案，当您使用.coalesce

df.coalesce(1)
  .write.format("com.databricks.spark.csv")
  .option("header", "true")
  .save("/your/location/mydata")

This would create a directory named mydatawhere you'll find a csvfile that contains the results.

这将创建一个名为的目录mydata，您将在其中找到csv包含结果的文件。

如何在 Scala 中将 DataFrame 导出到 csv？

提问by Tong

回答by karthik manchala

回答by Taylrl

回答by Abu Shoeb

相关推荐

最近更新

标签

如何在 Scala 中将 DataFrame 导出到 csv？

提问by Tong

回答by karthik manchala

回答by Taylrl

回答by Abu Shoeb

相关推荐

scala Apache Spark：如何将带有正则表达式的数据帧列转换为另一个数据帧？

Scala spark按键减少并找到共同价值

scala Akka-http：接受和内容类型处理

scala 从 Spark DataFrame 中的单个列派生多个列

相关推荐

最近更新

标签