na.fill in Spark DataFrame Scala

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39225528/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 08:35:14  来源:igfitidea点击:

na.fill in Spark DataFrame Scala

scalaapache-sparkdataframe

提问by Vijeth Hegde

I am using Spark/Scala and I want to fill the nulls in my DataFrame with default values based on the type of the columns.

我正在使用 Spark/Scala,我想根据列的类型使用默认值填充 DataFrame 中的空值。

i.e String Columns -> "string", Numeric Columns -> 111, Boolean Columns -> False etc.

即字符串列 -> “字符串”,数字列 -> 111,布尔列 -> False 等。

Currently the DF.na.functions API provides na.fill
fill(valueMap: Map[String, Any])like

目前 DF.na.functions API 提供了 na.fill 之
fill(valueMap: Map[String, Any])类的

df.na.fill(Map(
    "A" -> "unknown",
    "B" -> 1.0
))

This requires knowing the column names and also the type of the columns.

这需要知道列名和列的类型。

OR

或者

fill(value: String, cols: Seq[String])

This is only String/Double types, not even Boolean.

这只是 String/Double 类型,甚至不是 Boolean。

Is there a smart way to do this?

有没有聪明的方法来做到这一点?

回答by Chris Dove

Take a look at dtypes: Array[(String, String)]. You can use the output of this method to generate a Mapfor fill, e.g.:

看看dtypes: Array[(String, String)]。您可以使用此方法的输出来生成Mapfor fill,例如:

val typeMap = df.dtypes.map(column => 
    column._2 match {
        case "IntegerType" => (column._1 -> 0)
        case "StringType" => (column._1 -> "")
        case "DoubleType" => (column._1 -> 0.0)
    }).toMap