如何在 Hive SQL 中为日期列执行 BETWEEN 运算符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43988333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:13:33  来源:igfitidea点击:

How to perform a BETWEEN operator in Hive SQL for date column

sqlhadoophiveclouderabigdata

提问by MarioC

I'll try to explain my problem as clear as possible. I would like to filter a table by date (selecting only the record have the date included in current month) and in Oracle SQL I'm using the following query to achieve such goal:

我会尽量清楚地解释我的问题。我想按日期过滤表(仅选择记录包含在当前月份中的日期)并且在 Oracle SQL 中我使用以下查询来实现这样的目标:

select * from table t1 
where t1.DATE_COLUMN between TRUNC(SYSDATE, 'mm') and SYSDATE

How can I replicate the same filter in Hive SQL? The column I should use to apply the filter is a TIMESTAMPtype column (e.g. 2017-05-15 00:00:00).

如何在 Hive SQL 中复制相同的过滤器?我应该用来应用过滤器的列是TIMESTAMP类型的列(例如2017-05-15 00:00:00)。

I'm using CDH 5.7.6-1.

我正在使用 CDH 5.7.6-1。

Any advice?

有什么建议吗?

采纳答案by David ???? Markovitz

Be aware that unix_timestampis not fixed and is going to change during the query.
For that reason it cannot be used for partitions elimination.
For newer Hive versions use current_date/ current_timestampinstead.

请注意,这unix_timestamp不是固定的,并且会在查询期间发生变化。
因此,它不能用于分区消除。
对于较新的 Hive 版本,请使用current_date/current_timestamp代替。

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

select  * 
from    table t1 
where   t1.DATE_COLUMN  
          between  cast(from_unixtime(unix_timestamp(),'yyyy-MM-01 00:00:00') as timestamp)
          and      cast(from_unixtime(unix_timestamp()) as timestamp)
;


select  cast (from_unixtime(unix_timestamp(),'yyyy-MM-01 00:00:00') as timestamp)
       ,cast (from_unixtime(unix_timestamp()) as timestamp)
;


+---------------------+---------------------+
|         _c0         |         _c1         |
+---------------------+---------------------+
| 2017-05-01 00:00:00 | 2017-05-16 01:04:55 |
+---------------------+---------------------+

回答by Gordon Linoff

You can format as strings:

您可以格式化为字符串:

where date_format(t1.DATE_COLUMN, 'y-m') = date_format(current_timestamp, 'y-m')

I realize that I don't have Hive accessible right now. The documentation suggests 'y-m', but the Java documentation suggests 'yyyy-mm'.

我意识到我现在无法访问 Hive。文档建议'y-m',但 Java 文档建议'yyyy-mm'