使用 Scala 将数据帧转换为字符串并将输出保存到 csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42244800/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:05:26  来源:igfitidea点击:

Convert data frame into String using scala and save the ouput to a csv

scalaapache-sparkdataframe

提问by Art

I want to append rows in a single string

我想在单个字符串中附加行

+--------------------+
|   defectDescription|
+--------------------+
|ACEView NA : Daework|
|ACEView NA : Documen|
|ACEView NA : ACev   |
|ACEView NA : Dragdro|
+--------------------+

Expected Output:ACEView NA : Daework ACEView NA : Documen ACEView NA : ACev ACEView NA : Dragdro

预期输出:ACEView NA:Daework ACEView NA:Documen ACEView NA:ACev ACEView NA:Dragdro

回答by Assaf Mendelson

If you indeed want to get all the data into a single string you can do it using collect:

如果您确实想将所有数据放入一个字符串中,您可以使用 collect 来完成:

val rows = df.select("defectDescription").collect().map(_.getString(0)).mkString(" ")

You first select the relevant column (so you have just it) and collect it, it would give you an array of rows. the map turns each row to the string (there is just one column - 0). Then mkString would make an overall string of them with a space as the separator.

您首先选择相关列(因此您只有它)并收集它,它会给您一个行数组。地图将每一行转换为字符串(只有一列 - 0)。然后 mkString 将使用空格作为分隔符制作一个整体字符串。

Note that this would bring the entire dataframe to the driver which might cause memory exceptions. If you need just some of the data you can use take(n) instead of collect to limit the number of rows to n.

请注意,这会将整个数据帧带到可能导致内存异常的驱动程序。如果您只需要一些数据,您可以使用 take(n) 而不是 collect 将行数限制为 n。

回答by swapnil shashank

val str1 = df.select("defectDescription").collect.mkString(",")
val str =  str1.replaceAll("[\[\]]","")

Another way to do this is as follows:

另一种方法如下:

The 1st line selects the particular columns then collects the subset, collects behaves as: Collect (Action) - Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter or other operation that returns a sufficiently small subset of the data.

第一行选择特定的列,然后收集子集,收集的行为如下: 收集(操作) - 在驱动程序中将数据集的所有元素作为数组返回。这通常在过滤器或其他返回足够小的数据子集的操作之后很有用。

mkString - mkString method has an overloaded method which allows you to provide a delimiter to separate each element in the collection.

mkString - mkString 方法有一个重载方法,它允许您提供一个分隔符来分隔集合中的每个元素。

The 2nd line just replaces the additional brackets

第二行只是替换了额外的括号

回答by riddhi

df.createTempView(viewName="table")
val res=spark.sqlContext.sql(sqlText="select defectDescription from table").collectAsList.toString.replace("[", "").replace("]", "")

Initially create a temporary view of the dataframe, then convert into a list, and then string- Finally remove the brackets as per the required output.

最初创建数据帧的临时视图,然后转换为列表,然后是字符串 - 最后根据所需的输出删除括号。