scala 在火花数据框中减去两列空值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46334705/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:27:14  来源:igfitidea点击:

subtract two columns with null in spark dataframe

scalaapache-sparkapache-spark-sql

提问by warner

I new to spark, I have dataframe df:

我是 Spark 新手,我有数据框 df:

+----------+------------+-----------+
| Column1  | Column2    | Sub       |                          
+----------+------------+-----------+
| 1        | 2          | 1         |                                         
+----------+------------+-----------+
| 4        | null       | null      |                          
+----------+------------+-----------+
| 5        | null       | null      |                          
+----------+------------+-----------+
| 6        | 8          | 2         |                          
+----------+------------+-----------+

when subtracting two columns, one column has null so resulting column also resulting as null.

减去两列时,一列为空,因此结果列也为空。

df.withColumn("Sub", col(A)-col(B))

Expected output should be:

预期输出应该是:

+----------+------------+-----------+
|  Column1 | Column2    | Sub       |                          
+----------+------------+-----------+
| 1        | 2          | 1         |                                           
+----------+------------+-----------+
| 4        | null       | 4         |                          
+----------+------------+-----------+
| 5        | null       | 5         |                          
+----------+------------+-----------+
| 6        | 8          | 2         |                          
+----------+------------+-----------+

I don't want to replace the column2 to replace with 0, it should be null only. Can someone help me on this?

我不想用 0 替换 column2,它应该只为空。有人可以帮我吗?

回答by Ramesh Maharjan

You can use whenfunction as

您可以将when函数用作

import org.apache.spark.sql.functions._
df.withColumn("Sub", when(col("Column1").isNull, lit(0)).otherwise(col("Column1")) - when(col("Column2").isNull, lit(0)).otherwise(col("Column2")))

you should have final result as

你应该有最终结果

+-------+-------+----+
|Column1|Column2| Sub|
+-------+-------+----+
|      1|      2|-1.0|
|      4|   null| 4.0|
|      5|   null| 5.0|
|      6|      8|-2.0|
+-------+-------+----+

回答by Psidom

You can coalescenulls to zero on both columns and then do the subtraction:

您可以coalesce在两列上将空值归零,然后进行减法运算:

val df = Seq((Some(1), Some(2)), 
             (Some(4), null), 
             (Some(5), null), 
             (Some(6), Some(8))
            ).toDF("A", "B")

df.withColumn("Sub", abs(coalesce($"A", lit(0)) - coalesce($"B", lit(0)))).show
+---+----+---+
|  A|   B|Sub|
+---+----+---+
|  1|   2|  1|
|  4|null|  4|
|  5|null|  5|
|  6|   8|  2|
+---+----+---+