Scala Spark - 如何迭代数据帧中的字段

Question

提问by Alg_D

My Dataframe has several columns with different types (string, double, Map, array, etc).

我的 Dataframe 有几个不同类型的列（字符串、双精度、映射、数组等）。

I need to perform some operation in certain column types and I am looking for a nice way to identify the field type and then do the proper action

我需要在某些列类型中执行一些操作，我正在寻找一种很好的方法来识别字段类型，然后执行正确的操作

types: String|Double|Map<String,Int>|...

类型： String|Double|Map<String,Int>|...

|---------------------------------------------------------------
|myString1 |myDouble1|     myMap1                   | ...otherTypes                          
|---------------------------------------------------------------
|"string_1"|  123.0  |{"str1Map":1,"str2":2, "str31inmap": 31} |...
|"string_2"|  456.0  |{"str2Map":2,"str22":2, "str32inmap": 32}|...
|"string_3"|  789.0  |{"str3Map":3,"str23":2, "str33inmap": 33}|...
|---------------------------------------------------------------

Iterating the dataframe fields and printing: df.schema.fields.foreach { println }

迭代数据框字段并打印： df.schema.fields.foreach { println }

outputs:

输出：

StructField(myString1,StringType,true)
StructField(myDouble1,DoubleType,false)
StructField(myMap1,MapType(StringType,IntType,false),true)
...
StructField(myStringList,ArrayType(StringType,true),true)

So, my idea is to iterate through the fields and in case is one of the types that I need to perform an operation (e.g. on the Map type), then I know the field name/column and action to take.

所以，我的想法是遍历字段，如果是我需要执行操作的类型之一（例如在 Map 类型上），那么我知道字段名称/列和要采取的操作。

 df.schema.fields.foreach { f =>
     val fName = ?get the name
     val fType = ?get the Type
     print("Name{} Type:{}".format(fName , fType))

      // case type is Map do action X
      // case type is Stringdo action Y
      // ...

    }

Does this approach makes sense to detect the field types on my dataframe and then perform different on the df fields, depending on their type? How to get it to work?

这种方法是否有意义检测我的数据帧上的字段类型，然后根据它们的类型在 df 字段上执行不同的操作？如何让它工作？

Answer 1

回答by Lou_Ds

Note that print format in scala needs the %s, in python you can use {}

请注意，scala 中的打印格式需要 %s，在 python 中您可以使用 {}

This should work:

这应该有效：

 df.dtypes.foreach {  f =>
      val fName = f._1
      val fType = f._2
      if (fType  == "StringType") { println(s"STRING_TYPE") }
      if (fType  == "MapType") { println(s"MAP_TYPE") }
      //else {println("....")}
      println("Name %s Type:%s - all:%s".format(fName , fType, f))

    }

Scala Spark - 如何迭代数据帧中的字段

提问by Alg_D

回答by Lou_Ds

相关推荐

最近更新

标签

Scala Spark - 如何迭代数据帧中的字段

提问by Alg_D

回答by Lou_Ds

相关推荐

scala 来自 Spark Streaming 的 RestAPI 服务调用

将列表转换为数据帧 spark scala

将 RDD[String] 转换为 RDD[Row] 到 Dataframe Spark Scala

如何在 spark (scala) 中将 WrappedArray[WrappedArray[Float]] 转换为 Array[Array[Float]]

相关推荐

最近更新

标签