将 Row 转换为 spark Scala 中的映射
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46155500/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert Row to map in spark scala
提问by Sorin Bolos
I have a row from a data frame and I want to convert it to a Map[String, Any] that maps column names to the values in the row for that column.
我有来自数据框中的一行,我想将其转换为 Map[String, Any] ,该 Map[String, Any] 将列名称映射到该列的行中的值。
Is there an easy way to do it?
有没有简单的方法来做到这一点?
I did it for string values like
我是为字符串值做的,比如
def rowToMap(row:Row): Map[String, String] = {
row.schema.fieldNames.map(field => field -> row.getAs[String](field)).toMap
}
val myRowMap = rowToMap(myRow)
If the row contains other values, not specific ones like String then the code gets messier because the row does not have a a method .get(field)
如果该行包含其他值,而不是像 String 这样的特定值,那么代码会变得更加混乱,因为该行没有 aa 方法 .get(field)
Any ideas?
有任何想法吗?
回答by Psidom
You can use getValuesMap:
您可以使用getValuesMap:
val df = Seq((1, 2.0, "a")).toDF("A", "B", "C")
val row = df.first
To get Map[String, Any]:
得到Map[String, Any]:
row.getValuesMap[Any](row.schema.fieldNames)
// res19: Map[String,Any] = Map(A -> 1, B -> 2.0, C -> a)
Or you can get Map[String, AnyVal]for this simple case since the values are not complex objects
或者你可以得到Map[String, AnyVal]这个简单的情况,因为值不是复杂的对象
row.getValuesMap[AnyVal](row.schema.fieldNames)
// res20: Map[String,AnyVal] = Map(A -> 1, B -> 2.0, C -> a)
Note: the returned value type of the getValuesMapcan be labelled as any type, so you can not rely on it to figure out what data types you have but need to keep in mind what you have from the beginning instead.
注意:的返回值类型getValuesMap可以标记为任何类型,因此您不能依赖它来确定您拥有哪些数据类型,而是需要从一开始就记住您拥有的数据类型。
回答by Ramesh Maharjan
You can convert your dataframeto rddand use simple mapfunction and use headernamesin the MAPformation inside mapfunction and finally use collect
您可以将您转换dataframe为rdd使用简单的map函数并在函数内部headernames的MAP形成中map使用并最终使用collect
val fn = df.schema.fieldNames
val maps = df.rdd.map(row => fn.map(field => field -> row.getAs(field)).toMap).collect()
回答by Naman Agarwal
Let's say you have a data Frame with these columns:
假设您有一个包含这些列的数据框:
[time(TimeStampType), col1(DoubleType), col2(DoubleType)]
[time(TimeStampType), col1(DoubleType), col2(DoubleType)]
You can do something like this:
你可以这样做:
val modifiedDf = df.map{row =>
val doubleObject = row.getValuesMap(Seq("col1","col2"))
val timeObject = Map("time" -> row.getAs[TimeStamp]("time"))
val map = doubleObject ++ timeObject
}
回答by Schmitzi
Let's say you have a row without structure information and the column header as an array.
假设您有一行没有结构信息,而列标题是一个数组。
val rdd = sc.parallelize( Seq(Row("test1", "val1"), Row("test2", "val2"), Row("test3", "val3"), Row("test4", "val4")) )
rdd.collect.foreach(println)
val sparkFieldNames = Array("col1", "col2")
val mapRDD = rdd.map(
r => sparkFieldNames.zip(r.toSeq).toMap
)
mapRDD.collect.foreach(println)

