oracle Hive 中的 TRUNC 和 TO_DATE 有什么区别

Question

提问by mowen10

Hi i trying to find out what the difference between using a TRUNC and TO_DATE is in Hive.

嗨，我试图找出在 Hive 中使用 TRUNC 和 TO_DATE 之间的区别。

Currently within oracle I wrote the following case statement against the Data shown below:

目前在 oracle 中，我针对如下所示的数据编写了以下案例声明：

ORDER_NO | NAME | DATE_ | TASK_NO
ABC123 | Humpty | 07-OCT-16 12:30:54 | 1
ABC123 | Humpty | 07-OCT-16 12:30:54 | 2
ABC123 | Humpty | 07-OCT-16 12:32:20 | 6

SELECT ORDER_NO, NAME, DATE_, TASK_NO
    (CASE WHEN DATE_ - LAG(DATE_) OVER (PARTITION BY ORDER_NO, NAME, TRUNC(DATE_) ORDER BY DATE_) <= 1/48  
    THEN 0 ELSE 1 END) AS COUNT1

and this gives me the result:

这给了我结果：

ORDER_NO | NAME | DATE_ | TASK_NO | COUNT1
    ABC123 | Humpty | 07-OCT-16 12:30:54 | 1 | 1
    ABC123 | Humpty | 07-OCT-16 12:30:54 | 2 | 0
    ABC123 | Humpty | 07-OCT-16 12:32:20 | 6 | 1

which is correct. Now if i use the same query in Hive against my full data set I get an error message:

哪个是正确的。现在，如果我在 Hive 中对我的完整数据集使用相同的查询，我会收到一条错误消息：

Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns.

So I changed the TRUNC to TO_DATE and this works and gives me the following results:

所以我将 TRUNC 更改为 TO_DATE 并且这有效并给了我以下结果：

SELECT ORDER_NO, NAME, DATE_, TASK_NO
(CASE WHEN DATE_ - LAG(DATE_) OVER (PARTITION BY ORDER_NO, NAME, TO_DATE(DATE_) ORDER BY DATE_) <= 1/48  
    THEN 0 ELSE 1 END) AS COUNT1

and this gives me the result:

这给了我结果：

ORDER_NO | NAME | DATE_ | TASK_NO | COUNT1
ABC123 | Humpty | 07-OCT-16 12:30:54 | 1 | 1
ABC123 | Humpty | 07-OCT-16 12:32:20 | 6 | 1        
ABC123 | Humpty | 07-OCT-16 12:30:54 | 2 | 1

which is different to what I get in Oracle. From what I can gather the date value is stored as a string as the results arent ordered in Date/Time and this is where I think the problem lies but not sure what changes I need to make to fix it.

这与我在 Oracle 中得到的不同。从我可以收集到的日期值存储为字符串，因为结果不是按日期/时间排序的，这就是我认为问题所在但不确定我需要进行哪些更改来修复它。

Would really appreciate some advice.

真的很感激一些建议。

UPDATED CODE:

更新代码：

SELECT  
ORDER_NO
,NAME
,DATE_FIXED
,TASK_NO
,CASE WHEN DATE_UTS - LAG(DATE_UTS) OVER (PARTITION BY ORDER_NO, NAME, TO_DATE(DATE_FIXED) ORDER BY DATE_FIXED) <= 60*30
THEN    0
ELSE    1
END AS COUNT1
FROM
(
SELECT
ORDER_NO
,NAME
,TASK_NO
,FROM_UNIXTIME(UNIX_TIMESTAMP(DATE_, 'DD-MMM-YY HH:MM:SS')) AS DATE_FIXED
,UNIX_TIMESTAMP(DATE_, 'DD-MMM-YY HH:MM:SS') AS DATE_UTS
FROM TABLE1
) T

Answer 1

回答by David ???? Markovitz

1

Hive Operators and User-Defined Functions (UDFs)

Hive 运算符和用户定义的函数 (UDF)

to_date
Returns the date part of a timestamp string (pre-Hive 2.1.0):
to_date("1970-01-01 00:00:00") = "1970-01-01".
As of Hive 2.1.0, returns a date object.
Prior to Hive 2.1.0 (HIVE-13248) the return type was a String because no Date type existed when the method was created.

迄今为止
返回时间戳字符串的日期部分（Hive 2.1.0 之前）：
to_date("1970-01-01 00:00:00") = "1970-01-01"。
从 Hive 2.1.0 开始，返回一个日期对象。
在 Hive 2.1.0 ( HIVE-13248) 之前，返回类型是 String，因为在创建方法时不存在 Date 类型。

trunc
Returns date truncated to the unit specified by the format (as of Hive 1.2.0).
Supported formats: MONTH/MON/MM, YEAR/YYYY/YY.
Example: trunc('2015-03-17', 'MM') = 2015-03-01.

截断
返回截断为格式指定单位的日期（从 Hive 1.2.0 开始）。
支持的格式：MONTH/MON/MM、YEAR/YYYY/YY。
示例：trunc('2015-03-17', 'MM') = 2015-03-01。

2

You have errors in your original query

您的原始查询中有错误

There was no comma between TASK_NOand (CASE WHEN
Truncin Hive must take 1 parameter, and there is no parameter for day.
There is no minus operator for dates (and definitly not for strings). This results in a NULL.

TASK_NO和之间没有逗号(CASE WHEN
Trunc在 Hive 中必须带 1 个参数，而 day 没有参数。
日期没有减号运算符（绝对不是字符串）。这将导致 NULL。

3

The only recognize date format in Hive is YYYY-MM-DDwhich does not match your data.
Applying date functions on invalid string result in NULL.

Hive 中唯一可识别的日期格式是YYYY-MM-DD，它与您的数据不匹配。
对无效字符串应用日期函数会导致 NULL。

This is how you convert your data format to dates:

这是将数据格式转换为日期的方式：

hive> select from_unixtime(unix_timestamp('07-OCT-16 12:30:54','dd-MMM-yy HH:mm:ss'));
OK
2016-10-07 12:30:54

and the whole query:

和整个查询：

select  ORDER_NO
       ,NAME
       ,DATE_fixed
       ,TASK_NO

       ,case 
            when    DATE_uts 
                -   LAG(DATE_uts) OVER 
                    (
                        PARTITION BY    ORDER_NO,NAME,to_date(DATE_fixed) 
                        ORDER BY        DATE_fixed
                    )
                <= 60*30
            then    0
            else    1
        end             AS COUNT1

from   (select  ORDER_NO
               ,NAME
               ,TASK_NO
               ,from_unixtime(unix_timestamp(DATE_,'dd-MMM-yy HH:mm:ss'))   as DATE_fixed
               ,unix_timestamp(DATE_,'dd-MMM-yy HH:mm:ss')                  as DATE_uts

        from    t
        ) t
;

ABC123  Humpty  2016-10-07 12:30:54 2   1
ABC123  Humpty  2016-10-07 12:30:54 1   0
ABC123  Humpty  2016-10-07 12:32:20 6   0

This were also the results when I tested it on Oracle

这也是我在Oracle上测试的结果

oracle Hive 中的 TRUNC 和 TO_DATE 有什么区别

提问by mowen10

回答by David ???? Markovitz

1

1

2

2

3

3

相关推荐

最近更新

标签

oracle Hive 中的 TRUNC 和 TO_DATE 有什么区别

提问by mowen10

回答by David ???? Markovitz

1

1

2

2

3

3

相关推荐

oracle 在 SQL 更新语句中使用子字符串

Oracle SQL - 过去 12 个月的完整数据

oracle 在需要数字的地方发现了非数字字符

oracle SQLPlus 动态假脱机文件名

相关推荐

最近更新

标签