Scala Spark - 如何迭代数据帧中的字段

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42854075/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:08:00  来源:igfitidea点击:

Scala Spark - how to iterate fields in a Dataframe

scalaapache-sparkdataframe

提问by Alg_D

My Dataframe has several columns with different types (string, double, Map, array, etc).

我的 Dataframe 有几个不同类型的列(字符串、双精度、映射、数组等)。

I need to perform some operation in certain column types and I am looking for a nice way to identify the field type and then do the proper action

我需要在某些列类型中执行一些操作,我正在寻找一种很好的方法来识别字段类型,然后执行正确的操作

types: String|Double|Map<String,Int>|...

类型: String|Double|Map<String,Int>|...

|---------------------------------------------------------------
|myString1 |myDouble1|     myMap1                   | ...otherTypes                          
|---------------------------------------------------------------
|"string_1"|  123.0  |{"str1Map":1,"str2":2, "str31inmap": 31} |...
|"string_2"|  456.0  |{"str2Map":2,"str22":2, "str32inmap": 32}|...
|"string_3"|  789.0  |{"str3Map":3,"str23":2, "str33inmap": 33}|...
|---------------------------------------------------------------

Iterating the dataframe fields and printing: df.schema.fields.foreach { println }

迭代数据框字段并打印: df.schema.fields.foreach { println }

outputs:

输出:

StructField(myString1,StringType,true)
StructField(myDouble1,DoubleType,false)
StructField(myMap1,MapType(StringType,IntType,false),true)
...
StructField(myStringList,ArrayType(StringType,true),true)

So, my idea is to iterate through the fields and in case is one of the types that I need to perform an operation (e.g. on the Map type), then I know the field name/column and action to take.

所以,我的想法是遍历字段,如果是我需要执行操作的类型之一(例如在 Map 类型上),那么我知道字段名称/列和要采取的操作。

 df.schema.fields.foreach { f =>
     val fName = ?get the name
     val fType = ?get the Type
     print("Name{} Type:{}".format(fName , fType))

      // case type is Map do action X
      // case type is Stringdo action Y
      // ...

    }

Does this approach makes sense to detect the field types on my dataframe and then perform different on the df fields, depending on their type? How to get it to work?

这种方法是否有意义检测我的数据帧上的字段类型,然后根据它们的类型在 df 字段上执行不同的操作?如何让它工作?

回答by Lou_Ds

Note that print format in scala needs the %s, in python you can use {}

请注意,scala 中的打印格式需要 %s,在 python 中您可以使用 {}

This should work:

这应该有效:

 df.dtypes.foreach {  f =>
      val fName = f._1
      val fType = f._2
      if (fType  == "StringType") { println(s"STRING_TYPE") }
      if (fType  == "MapType") { println(s"MAP_TYPE") }
      //else {println("....")}
      println("Name %s Type:%s - all:%s".format(fName , fType, f))

    }