Java Spark SQL - 如何将 DataFrame 写入文本文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36010984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 17:22:26  来源:igfitidea点击:

Spark SQL - How to write DataFrame to text file?

javaapache-spark-sql

提问by Shankar

I am using Spark SQLfor reading parquet and writing parquet file.

Spark SQL用于读取镶木地板和写入镶木地板文件。

But some cases,i need to write the DataFrameas text file instead of Json or Parquet.

但在某些情况下,我需要编写DataFrame文本文件而不是 Json 或 Parquet。

Is there any default methods supported or i have to convert that DataFrame to RDDthen use saveAsTextFile()method?

是否支持任何默认方法,或者我必须将该 DataFrame 转换为RDD然后使用saveAsTextFile()方法?

采纳答案by Radu Ionescu

Using Databricks Spark-CSVyou can save directly to a CSV file and load from a CSV file afterwards like this

使用Databricks Spark-CSV,您可以直接保存到 CSV 文件,然后像这样从 CSV 文件加载

import org.apache.spark.sql.SQLContext

SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
    .format("com.databricks.spark.csv")
    .option("inferSchema", "true")
    .option("header", "true")
    .load("cars.csv");

df.select("year", "model").write()
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save("newcars.csv");
import org.apache.spark.sql.SQLContext

SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
    .format("com.databricks.spark.csv")
    .option("inferSchema", "true")
    .option("header", "true")
    .load("cars.csv");

df.select("year", "model").write()
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
    .save("newcars.csv");

回答by Igorock

df.repartition(1).write.option("header", "true").csv("filename.csv")