scala 如何在 Spark SQL 中按日期范围过滤
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33938806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to filter by date range in Spark SQL
提问by prit4fun
I'm trying to filter the date range from the following data using Data bricks, which returns null as response. My csv data looks like:
我正在尝试使用数据块从以下数据中过滤日期范围,该数据块返回 null 作为响应。我的 csv 数据如下所示:
ID, Desc, Week_Ending_Date
100, AAA, 13-06-2015
101, BBB, 11-07-2015
102, CCC, 15-08-2015
103, DDD, 05-09-2015
100, AAA, 29-08-2015
100, AAA, 22-08-2015
My query is:
我的查询是:
df.select(df("ID"), date_format(df("Week_Ending_Date"), "yyyy-MM-dd"))
.filter(date_format(df("Week_Ending_Date"), "yyyy-MM- dd").between("2015-07-05", "2015-09-02"))
Any help is much appreciated.
任何帮助深表感谢。
回答by eliasah
From the top of my head, I would have done the following by converting the date column while reading it and then apply the filter using an alias :
在我的脑海中,我会通过在阅读日期列时转换日期列来完成以下操作,然后使用别名应用过滤器:
import java.text.SimpleDateFormat
val format = new SimpleDateFormat("dd-MM-yyyy")
val data = sc.parallelize(
List((100, "AAA", "13-06-2015"), (101, "BBB", "11-07-2015"), (102, "CCC", "15-08-2015"), (103, "DDD", "05-09-2015"), (100, "AAA", "29-08-2015"), (100, "AAA", "22-08-2015")).toSeq).map {
r =>
val date: java.sql.Date = new java.sql.Date(format.parse(r._3).getTime);
(r._1, r._2, date)
}.toDF("ID", "Desc", "Week_Ending_Date")
data.show
//+---+----+----------------+
//| ID|Desc|Week_Ending_Date|
//+---+----+----------------+
//|100| AAA| 2015-06-13|
//|101| BBB| 2015-07-11|
//|102| CCC| 2015-08-15|
//|103| DDD| 2015-09-05|
//|100| AAA| 2015-08-29|
//|100| AAA| 2015-08-22|
//+---+----+----------------+
val filteredData = data
.select(data("ID"), date_format(data("Week_Ending_Date"), "yyyy-MM-dd").alias("date"))
.filter($"date".between("2015-07-05", "2015-09-02"))
//+---+----------+
//| ID| date|
//+---+----------+
//|101|2015-07-11|
//|102|2015-08-15|
//|100|2015-08-29|
//|100|2015-08-22|
//+---+----------+

