na.fill in Spark DataFrame Scala
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39225528/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
na.fill in Spark DataFrame Scala
提问by Vijeth Hegde
I am using Spark/Scala and I want to fill the nulls in my DataFrame with default values based on the type of the columns.
我正在使用 Spark/Scala,我想根据列的类型使用默认值填充 DataFrame 中的空值。
i.e String Columns -> "string", Numeric Columns -> 111, Boolean Columns -> False etc.
即字符串列 -> “字符串”,数字列 -> 111,布尔列 -> False 等。
Currently the DF.na.functions API provides na.fillfill(valueMap: Map[String, Any])like
目前 DF.na.functions API 提供了 na.fill 之fill(valueMap: Map[String, Any])类的
df.na.fill(Map(
"A" -> "unknown",
"B" -> 1.0
))
This requires knowing the column names and also the type of the columns.
这需要知道列名和列的类型。
OR
或者
fill(value: String, cols: Seq[String])
This is only String/Double types, not even Boolean.
这只是 String/Double 类型,甚至不是 Boolean。
Is there a smart way to do this?
有没有聪明的方法来做到这一点?
回答by Chris Dove
Take a look at dtypes: Array[(String, String)]. You can use the output of this method to generate a Mapfor fill, e.g.:
看看dtypes: Array[(String, String)]。您可以使用此方法的输出来生成Mapfor fill,例如:
val typeMap = df.dtypes.map(column =>
column._2 match {
case "IntegerType" => (column._1 -> 0)
case "StringType" => (column._1 -> "")
case "DoubleType" => (column._1 -> 0.0)
}).toMap

