scala 在文本文件中写入/存储数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44537889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Write/store dataframe in text file
提问by Pravinkumar Hadpad
I am trying to write dataframeto textfile. If a file contains single column then I am able to write in text file. If file contains multiple column then I a facing some error
我试图写dataframe至text文件。如果文件包含单列,那么我可以写入文本文件。如果文件包含多列,那么我将面临一些错误
Text data source supports only a single column, and you have 2 columns.
文本数据源仅支持单列,您有 2 列。
object replace {
def main(args:Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession.builder.master("local[1]").appName("Decimal Field Validation").getOrCreate()
var sourcefile = spark.read.option("header","true").text("C:/Users/phadpa01/Desktop/inputfiles/decimalvalues.txt")
val rowRDD = sourcefile.rdd.zipWithIndex().map(indexedRow => Row.fromSeq((indexedRow._2.toLong+1) +: indexedRow._1.toSeq)) //adding prgrefnbr
//add column for prgrefnbr in schema
val newstructure = StructType(Array(StructField("PRGREFNBR",LongType)).++(sourcefile.schema.fields))
//create new dataframe containing prgrefnbr
sourcefile = spark.createDataFrame(rowRDD, newstructure)
val op= sourcefile.write.mode("overwrite").format("text").save("C:/Users/phadpa01/Desktop/op")
}
}
回答by Ramesh Maharjan
you can convert the dataframe to rdd and covert the row to string and write the last line as
您可以将数据帧转换为 rdd 并将行转换为字符串并将最后一行写为
val op= sourcefile.rdd.map(_.toString()).saveAsTextFile("C:/Users/phadpa01/Desktop/op")
Edited
已编辑
As @philantrovert and @Pravinkumar have pointed that the above would append [and ]in the output file, which is true. The solution would be to replacethem with emptycharacter as
作为@philantrovert和@Pravinkumar都指出,上述将附加[和]输出文件,这是真的。解决方案将是replace他们的empty性格
val op= sourcefile.rdd.map(_.toString().replace("[","").replace("]", "")).saveAsTextFile("C:/Users/phadpa01/Desktop/op")
One can even use regex
一个甚至可以使用 regex
回答by Marsellus Wallace
I would recommend using a csvor other delimited formats. The following is an example with the most concise/elegantway to write to .tsv in Spark 2+
我建议使用 acsv或其他分隔格式。以下是在 Spark 2+ 中以最简洁/优雅的方式写入 .tsv的示例
val tsvWithHeaderOptions: Map[String, String] = Map(
("delimiter", "\t"), // Uses "\t" delimiter instead of default ","
("header", "true")) // Writes a header record with column names
df.coalesce(1) // Writes to a single file
.write
.mode(SaveMode.Overwrite)
.options(tsvWithHeaderOptions)
.csv("output/path")
回答by Pala
I think using "substring" is more appropriate for all scenarios I feel.
我认为使用“子字符串”更适合我觉得的所有场景。
Please check below code.
请检查以下代码。
sourcefile.rdd
.map(r => { val x = r.toString; x.substring(1, x.length-1)})
.saveAsTextFile("C:/Users/phadpa01/Desktop/op")
回答by Yaron
You can save as text CSVfile (.format("csv"))
您可以另存为文本CSV文件 ( .format("csv"))
The result will be a text file in a CSV format, each column will be separated by a comma.
结果将是一个 CSV 格式的文本文件,每列将用逗号分隔。
val op = sourcefile.write.mode("overwrite").format("csv").save("C:/Users/phadpa01/Desktop/op")
More info can be found in the spark programming guide
更多信息可以在spark 编程指南中找到
回答by Balaji Reddy
I use databricks api to save my DF output into text file.
我使用 databricks api 将我的 DF 输出保存到文本文件中。
myDF.write.format("com.databricks.spark.csv").option("header", "true").save("output.csv")

