使用 Scala 将数据帧转换为字符串并将输出保存到 csv

Question

提问by Art

I want to append rows in a single string

我想在单个字符串中附加行

+--------------------+
|   defectDescription|
+--------------------+
|ACEView NA : Daework|
|ACEView NA : Documen|
|ACEView NA : ACev   |
|ACEView NA : Dragdro|
+--------------------+

Expected Output:ACEView NA : Daework ACEView NA : Documen ACEView NA : ACev ACEView NA : Dragdro

预期输出：ACEView NA：Daework ACEView NA：Documen ACEView NA：ACev ACEView NA：Dragdro

Answer 1

回答by Assaf Mendelson

If you indeed want to get all the data into a single string you can do it using collect:

如果您确实想将所有数据放入一个字符串中，您可以使用 collect 来完成：

val rows = df.select("defectDescription").collect().map(_.getString(0)).mkString(" ")

You first select the relevant column (so you have just it) and collect it, it would give you an array of rows. the map turns each row to the string (there is just one column - 0). Then mkString would make an overall string of them with a space as the separator.

您首先选择相关列（因此您只有它）并收集它，它会给您一个行数组。地图将每一行转换为字符串（只有一列 - 0）。然后 mkString 将使用空格作为分隔符制作一个整体字符串。

Note that this would bring the entire dataframe to the driver which might cause memory exceptions. If you need just some of the data you can use take(n) instead of collect to limit the number of rows to n.

请注意，这会将整个数据帧带到可能导致内存异常的驱动程序。如果您只需要一些数据，您可以使用 take(n) 而不是 collect 将行数限制为 n。

Answer 2

回答by swapnil shashank

val str1 = df.select("defectDescription").collect.mkString(",")
val str =  str1.replaceAll("[\[\]]","")

Another way to do this is as follows:

另一种方法如下：

The 1st line selects the particular columns then collects the subset, collects behaves as: Collect (Action) - Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter or other operation that returns a sufficiently small subset of the data.

第一行选择特定的列，然后收集子集，收集的行为如下：收集（操作） - 在驱动程序中将数据集的所有元素作为数组返回。这通常在过滤器或其他返回足够小的数据子集的操作之后很有用。

mkString - mkString method has an overloaded method which allows you to provide a delimiter to separate each element in the collection.

mkString - mkString 方法有一个重载方法，它允许您提供一个分隔符来分隔集合中的每个元素。

The 2nd line just replaces the additional brackets

第二行只是替换了额外的括号

Answer 3

回答by riddhi

df.createTempView(viewName="table")
val res=spark.sqlContext.sql(sqlText="select defectDescription from table").collectAsList.toString.replace("[", "").replace("]", "")

Initially create a temporary view of the dataframe, then convert into a list, and then string- Finally remove the brackets as per the required output.

最初创建数据帧的临时视图，然后转换为列表，然后是字符串 - 最后根据所需的输出删除括号。

使用 Scala 将数据帧转换为字符串并将输出保存到 csv

提问by Art

回答by Assaf Mendelson

回答by swapnil shashank

回答by riddhi

相关推荐

最近更新

标签

使用 Scala 将数据帧转换为字符串并将输出保存到 csv

提问by Art

回答by Assaf Mendelson

回答by swapnil shashank

回答by riddhi

相关推荐

使用 Scala 从 HDFS 读取数据

如何在 Scala 中查找值类型的实例？

如何使用 Mockito 在 Scala 对象中模拟函数？

scala SPARK DataFrame：如何根据相同的列值有效地拆分每个组的数据帧

相关推荐

最近更新

标签