scala 将火花数据帧转换为数组 [String]
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46134943/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert spark dataframe to Array[String]
提问by Bharath
Can any tell me how to convert Spark dataframe into Array[String] in scala.
谁能告诉我如何在 Scala 中将 Spark 数据帧转换为 Array[String] 。
I have used the following.
我使用了以下内容。
x =df.select(columns.head, columns.tail: _*).collect()
The above snippet gives me an Array[Row] and not Array[String]
上面的代码片段给了我一个 Array[Row] 而不是 Array[String]
回答by Sohum Sachdev
This should do the trick:
这应该可以解决问题:
df.select(columns: _*).collect.map(_.toSeq)
回答by loneStar
DataFrame to Array[String]
数据帧到数组[字符串]
data.collect.map(_.toSeq).flatten
You can also use the following
您还可以使用以下
data.collect.map(row=>row.getString(0))
If you have more columns then it is good to use the last one
如果您有更多列,那么最好使用最后一列
data.rdd.map(row=>row.getString(0)).collect
回答by Areeha
If you are planning to read the dataset line by line, then you can use the iterator over the dataset:
如果您打算逐行读取数据集,那么您可以在数据集上使用迭代器:
Dataset<Row>csv=session.read().format("csv").option("sep",",").option("inferSchema",true).option("escape, "\"").option("header", true).option("multiline",true).load(users/abc/....);
for(Iterator<Row> iter = csv.toLocalIterator(); iter.hasNext();) {
String[] item = ((iter.next()).toString().split(",");
}
回答by Bharath
The answer was provided by a user named cricket_007. You can use the following to convert Array[Row] to Array[String] :
答案由名为 cricket_007 的用户提供。您可以使用以下内容将 Array[Row] 转换为 Array[String] :
x =df.select(columns.head, columns.tail: _*).collect().map { row => row.toString() }
Thanks, Bharath
谢谢,巴拉特

