将天数列添加到 Spark Scala 应用程序的同一数据框中的日期列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44361332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 09:17:02  来源:igfitidea点击:

Add Number of days column to Date Column in same dataframe for Spark Scala App

scalaapache-sparkdataframedateadd

提问by qubiter

I have a dataframedf of columns("id", "current_date", "days")and I am trying to add the the "days" to "current_date" and create a new dataframewith new columncalled "new_date" using spark scala function date_add()

我有一个dataframedf ,columns("id", "current_date", "days")我正在尝试将“ days”添加到“ current_date”,并使用 spark scala 函数创建一个名为“ ”的dataframe新对象columnnew_datedate_add()

val newDF = df.withColumn("new_Date", date_add(df("current_date"), df("days").cast("Int")))

But looks like the function date_addonly accepts Intvalues and not columns. How can get the desired output in such case? Are there any alternative functions i can use to get the desired output?

但看起来该函数date_add只接受Int值而不接受columns. 在这种情况下如何获得所需的输出?我可以使用任何替代功能来获得所需的输出吗?

spark version: 1.6.0 scala version: 2.10.6

火花版本:1.6.0 Scala 版本:2.10.6

采纳答案by rogue-one

A small custom udf can be used to make this date arithmetic possible.

一个小的自定义 udf 可用于使此日期算术成为可能。

import org.apache.spark.sql.functions.udf
import java.util.concurrent.TimeUnit
import java.util.Date
import java.text.SimpleDateFormat    

val date_add = udf((x: String, y: Int) => {
    val sdf = new SimpleDateFormat("yyyy-MM-dd")
    val result = new Date(sdf.parse(x).getTime() + TimeUnit.DAYS.toMillis(y))
  sdf.format(result)
} )

Usage:

用法

scala> val df = Seq((1, "2017-01-01", 10), (2, "2017-01-01", 20)).toDF("id", "current_date", "days")
df: org.apache.spark.sql.DataFrame = [id: int, current_date: string, days: int]

scala> df.withColumn("new_Date", date_add($"current_date", $"days")).show()
+---+------------+----+----------+
| id|current_date|days|  new_Date|
+---+------------+----+----------+
|  1|  2017-01-01|  10|2017-01-11|
|  2|  2017-01-01|  20|2017-01-21|
+---+------------+----+----------+

回答by Raphael Roth

No need to use an UDF, you can do it using an SQL expression:

无需使用 UDF,您可以使用 SQL 表达式来完成:

val newDF = df.withColumn("new_date", expr("date_add(current_date,days)"))