scala 如何在 Spark SQL 中按日期范围过滤

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33938806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 07:49:33  来源:igfitidea点击:

How to filter by date range in Spark SQL

scalaapache-sparkapache-spark-sql

提问by prit4fun

I'm trying to filter the date range from the following data using Data bricks, which returns null as response. My csv data looks like:

我正在尝试使用数据块从以下数据中过滤日期范围,该数据块返回 null 作为响应。我的 csv 数据如下所示:

ID, Desc, Week_Ending_Date
100, AAA, 13-06-2015
101, BBB, 11-07-2015
102, CCC, 15-08-2015
103, DDD, 05-09-2015
100, AAA, 29-08-2015
100, AAA, 22-08-2015

My query is:

我的查询是:

df.select(df("ID"), date_format(df("Week_Ending_Date"), "yyyy-MM-dd"))
.filter(date_format(df("Week_Ending_Date"), "yyyy-MM-  dd").between("2015-07-05", "2015-09-02"))

Any help is much appreciated.

任何帮助深表感谢。

回答by eliasah

From the top of my head, I would have done the following by converting the date column while reading it and then apply the filter using an alias :

在我的脑海中,我会通过在阅读日期列时转换日期列来完成以下操作,然后使用别名应用过滤器:

import java.text.SimpleDateFormat

val format = new SimpleDateFormat("dd-MM-yyyy")
val data = sc.parallelize(
  List((100, "AAA", "13-06-2015"), (101, "BBB", "11-07-2015"), (102, "CCC", "15-08-2015"), (103, "DDD", "05-09-2015"), (100, "AAA", "29-08-2015"), (100, "AAA", "22-08-2015")).toSeq).map {
  r =>
    val date: java.sql.Date = new java.sql.Date(format.parse(r._3).getTime);
    (r._1, r._2, date)
}.toDF("ID", "Desc", "Week_Ending_Date")

data.show

//+---+----+----------------+
//| ID|Desc|Week_Ending_Date|
//+---+----+----------------+
//|100| AAA|      2015-06-13|
//|101| BBB|      2015-07-11|
//|102| CCC|      2015-08-15|
//|103| DDD|      2015-09-05|
//|100| AAA|      2015-08-29|
//|100| AAA|      2015-08-22|
//+---+----+----------------+

val filteredData = data
           .select(data("ID"), date_format(data("Week_Ending_Date"), "yyyy-MM-dd").alias("date"))
           .filter($"date".between("2015-07-05", "2015-09-02"))

//+---+----------+
//| ID|      date|
//+---+----------+
//|101|2015-07-11|
//|102|2015-08-15|
//|100|2015-08-29|
//|100|2015-08-22|
//+---+----------+