scala 使用Scala中的列和索引将数组转换为数据框

Question

提问by PRIYA M

Initially I have a matrix

最初我有一个矩阵

 0.0  0.4  0.4  0.0 
 0.1  0.0  0.0  0.7 
 0.0  0.2  0.0  0.3 
 0.3  0.0  0.0  0.0

The matrix matrixis converted into a normal_arrayby

该矩阵matrix被转换成normal_array由

`val normal_array = matrix.toArray`

and I have an array of string

我有一个字符串数组

inputCols : Array[String] = Array(p1, p2, p3, p4)

I need to convert this matrix into a following data frame. (Note: The number of rows and columns in the matrix will be the same as the length of the inputCols)

我需要将此矩阵转换为以下数据框。（注意：矩阵中的行数和列数将与的长度相同inputCols）

index  p1   p2   p3   p4
 p1    0.0  0.4  0.4  0.0 
 p2    0.1  0.0  0.0  0.7 
 p3    0.0  0.2  0.0  0.3 
 p4    0.3  0.0  0.0  0.0

In python, this can be easily achieved by pandaslibrary.

在python中，这可以通过pandas库轻松实现。

arrayToDataframe = pandas.DataFrame(normal_array,columns = inputCols, index = inputCols)

But how can I do this in Scala?

但是我怎么能做到这一点Scala呢？

Answer 1

采纳答案by Manoj Kumar Dhakad

You can do something like below

您可以执行以下操作

 //convert your data to Scala Seq/List/Array

 val list = Seq((0.0,0.4,0.4,0.0),(0.1,0.0,0.0,0.7),(0.0,0.2,0.0,0.3),(0.3,0.0,0.0,0.0))

  //Define your Array of desired columns

  val inputCols : Array[String] = Array("p1", "p2", "p3", "p4")

  //Create DataFrame from given data, It will create dataframe with its own column names like _c1,_c2 etc

  val df = sparkSession.createDataFrame(list)

  //Getting the list of column names from dataframe

  val dfColumns=df.columns

  //Creating query to rename columns

  val query=inputCols.zipWithIndex.map(index=>dfColumns(index._2)+" as "+inputCols(index._2))

  //Firing above query  

  val newDf=df.selectExpr(query:_*)

 //Creating udf which get index(0,1,2,3) as input and returns corresponding column name from your given array of columns

  val getIndexUDF=udf((row_no:Int)=>inputCols(row_no))

  //Adding temporary column row_no which contains index of row and removing after adding index column

  val dfWithRow=newDf.withColumn("row_no",monotonicallyIncreasingId).withColumn("index",getIndexUDF(col("row_no"))).drop("row_no")

  dfWithRow.show

Sample Output:

示例输出：

+---+---+---+---+-----+
| p1| p2| p3| p4|index|
+---+---+---+---+-----+
|0.0|0.4|0.4|0.0|   p1|
|0.1|0.0|0.0|0.7|   p2|
|0.0|0.2|0.0|0.3|   p3|
|0.3|0.0|0.0|0.0|   p4|
+---+---+---+---+-----+

Answer 2

回答by 1pluszara

Here is another way:

这是另一种方式：

val data = Seq((0.0,0.4,0.4,0.0),(0.1,0.0,0.0,0.7),(0.0,0.2,0.0,0.3),(0.3,0.0,0.0,0.0))
val cols = Array("p1", "p2", "p3", "p4","index")

Zip the collection and convert it into DataFrame.

压缩集合并将其转换为 DataFrame。

data.zip(cols).map { 
  case (col,index) => (col._1,col._2,col._3,col._4,index)
}.toDF(cols: _*)

Output:

输出：

+---+---+---+---+-----+
|p1 |p2 |p3 |p4 |index|
+---+---+---+---+-----+
|0.0|0.4|0.4|0.0|p1   |
|0.1|0.0|0.0|0.7|p2   |
|0.0|0.2|0.0|0.3|p3   |
|0.3|0.0|0.0|0.0|p4   |
+---+---+---+---+-----+

scala 使用Scala中的列和索引将数组转换为数据框

提问by PRIYA M

采纳答案by Manoj Kumar Dhakad

回答by 1pluszara

相关推荐

最近更新

标签

scala 使用Scala中的列和索引将数组转换为数据框

提问by PRIYA M

采纳答案by Manoj Kumar Dhakad

回答by 1pluszara

相关推荐

scala 获取 Spark 数据框列列表

将一个数据帧中的列添加到 Scala 中的另一个数据帧

scala Spark textFile 与 WholeTextFiles

如何在 Zeppelin 中检查 Spark 和 Scala 的版本？

相关推荐

最近更新

标签