Java Spark SQL - 如何将 DataFrame 写入文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36010984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark SQL - How to write DataFrame to text file?
提问by Shankar
I am using Spark SQL
for reading parquet and writing parquet file.
我Spark SQL
用于读取镶木地板和写入镶木地板文件。
But some cases,i need to write the DataFrame
as text file instead of Json or Parquet.
但在某些情况下,我需要编写DataFrame
文本文件而不是 Json 或 Parquet。
Is there any default methods supported or i have to convert that DataFrame to RDD
then use saveAsTextFile()
method?
是否支持任何默认方法,或者我必须将该 DataFrame 转换为RDD
然后使用saveAsTextFile()
方法?
采纳答案by Radu Ionescu
Using Databricks Spark-CSVyou can save directly to a CSV file and load from a CSV file afterwards like this
使用Databricks Spark-CSV,您可以直接保存到 CSV 文件,然后像这样从 CSV 文件加载
import org.apache.spark.sql.SQLContext SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("com.databricks.spark.csv") .option("inferSchema", "true") .option("header", "true") .load("cars.csv"); df.select("year", "model").write() .format("com.databricks.spark.csv") .option("header", "true") .option("codec", "org.apache.hadoop.io.compress.GzipCodec") .save("newcars.csv");
import org.apache.spark.sql.SQLContext SQLContext sqlContext = new SQLContext(sc); DataFrame df = sqlContext.read() .format("com.databricks.spark.csv") .option("inferSchema", "true") .option("header", "true") .load("cars.csv"); df.select("year", "model").write() .format("com.databricks.spark.csv") .option("header", "true") .option("codec", "org.apache.hadoop.io.compress.GzipCodec") .save("newcars.csv");
回答by Igorock
df.repartition(1).write.option("header", "true").csv("filename.csv")