Java Spark SQL - 如何将 DataFrame 写入文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36010984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark SQL - How to write DataFrame to text file?
提问by Shankar
I am using Spark SQLfor reading parquet and writing parquet file.
我Spark SQL用于读取镶木地板和写入镶木地板文件。
But some cases,i need to write the DataFrameas text file instead of Json or Parquet.
但在某些情况下,我需要编写DataFrame文本文件而不是 Json 或 Parquet。
Is there any default methods supported or i have to convert that DataFrame to RDDthen use saveAsTextFile()method?
是否支持任何默认方法,或者我必须将该 DataFrame 转换为RDD然后使用saveAsTextFile()方法?
采纳答案by Radu Ionescu
Using Databricks Spark-CSVyou can save directly to a CSV file and load from a CSV file afterwards like this
使用Databricks Spark-CSV,您可以直接保存到 CSV 文件,然后像这样从 CSV 文件加载
import org.apache.spark.sql.SQLContext SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("com.databricks.spark.csv") .option("inferSchema", "true") .option("header", "true") .load("cars.csv"); df.select("year", "model").write() .format("com.databricks.spark.csv") .option("header", "true") .option("codec", "org.apache.hadoop.io.compress.GzipCodec") .save("newcars.csv");
import org.apache.spark.sql.SQLContext SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("com.databricks.spark.csv") .option("inferSchema", "true") .option("header", "true") .load("cars.csv"); df.select("year", "model").write() .format("com.databricks.spark.csv") .option("header", "true") .option("codec", "org.apache.hadoop.io.compress.GzipCodec") .save("newcars.csv");
回答by Igorock
df.repartition(1).write.option("header", "true").csv("filename.csv")

