na.fill in Spark DataFrame Scala

Question

提问by Vijeth Hegde

I am using Spark/Scala and I want to fill the nulls in my DataFrame with default values based on the type of the columns.

我正在使用 Spark/Scala，我想根据列的类型使用默认值填充 DataFrame 中的空值。

i.e String Columns -> "string", Numeric Columns -> 111, Boolean Columns -> False etc.

即字符串列 -> “字符串”，数字列 -> 111，布尔列 -> False 等。

Currently the DF.na.functions API provides na.fill
fill(valueMap: Map[String, Any])like

目前 DF.na.functions API 提供了 na.fill 之
fill(valueMap: Map[String, Any])类的

df.na.fill(Map(
    "A" -> "unknown",
    "B" -> 1.0
))

This requires knowing the column names and also the type of the columns.

这需要知道列名和列的类型。

OR

或者

fill(value: String, cols: Seq[String])

This is only String/Double types, not even Boolean.

这只是 String/Double 类型，甚至不是 Boolean。

Is there a smart way to do this?

有没有聪明的方法来做到这一点？

Answer 1

回答by Chris Dove

Take a look at dtypes: Array[(String, String)]. You can use the output of this method to generate a Mapfor fill, e.g.:

看看dtypes: Array[(String, String)]。您可以使用此方法的输出来生成Mapfor fill，例如：

val typeMap = df.dtypes.map(column => 
    column._2 match {
        case "IntegerType" => (column._1 -> 0)
        case "StringType" => (column._1 -> "")
        case "DoubleType" => (column._1 -> 0.0)
    }).toMap

na.fill in Spark DataFrame Scala

提问by Vijeth Hegde

回答by Chris Dove

相关推荐

最近更新

标签

na.fill in Spark DataFrame Scala

提问by Vijeth Hegde

回答by Chris Dove

相关推荐

scala 在 Spark 数据集中滚动你自己的 reduceByKey

如何在列数据 Spark scala 上检查 isEmpty

scala 如何在 Spark 窗口函数中以降序使用 orderby()？

scala 如何命名聚合列？

相关推荐

最近更新

标签