SQL 如何按操作员从 Hive 组中获取元素的数组/包？

Question

提问by Anuroop

I want to group by a given field and get the output with grouped fields. Below is an example of what I am trying to achieve:-

我想按给定的字段分组并使用分组的字段获取输出。下面是我试图实现的一个例子：-

Imagine a table named 'sample_table' with two columns as below:-

想象一个名为“sample_table”的表，其中包含如下两列：-

I want to write Hive Query that will give the below output:-

我想编写 Hive Query 来提供以下输出：-

001 [111, 222, 123]
002 [222, 333]
003 [555]

In Pig, this can be very easily achieved by something like this:-

在 Pig 中，这可以通过以下方式轻松实现：-

grouped_relation = GROUP sample_table BY F1;

Can somebody please suggest if there is a simple way to do so in Hive? What I can think of is to write a User Defined Function (UDF) for this but this may be a very time consuming option.

有人可以建议在 Hive 中是否有一种简单的方法吗？我能想到的是为此编写一个用户定义函数（UDF），但这可能是一个非常耗时的选择。

Answer 1

回答by Daniel Koverman

The built in aggregate function collect_set(doumented here) gets you almost what you want. It would actually work on your example input:

内置的聚合函数collect_set（此处为 doumented）几乎可以满足您的需求。它实际上适用于您的示例输入：

SELECT F1, collect_set(F2)
FROM sample_table
GROUP BY F1

Unfortunately, it also removes duplicate elements and I imagine this isn't your desired behavior. I find it odd that collect_setexists, but no version to keep duplicates. Someone else apparently thought the same thing. It looks like the top and second answer there will give you the UDAF you need.

不幸的是，它还删除了重复的元素，我想这不是您想要的行为。我觉得collect_set存在很奇怪，但没有版本可以保留重复项。其他人显然也有同样的想法。看起来那里的顶部和第二个答案将为您提供所需的 UDAF。

Answer 2

回答by ellaqezi

collect_set actually works as expected since a set as per definition is a collection of well defined and distinctobjects i.e. objects occur exactly once or not at all within a set.

collect_set 实际上按预期工作，因为根据定义的集合是定义明确且不同的对象的集合，即对象在集合中只出现一次或根本不出现。

SQL 如何按操作员从 Hive 组中获取元素的数组/包？

提问by Anuroop

回答by Daniel Koverman

回答by ellaqezi

相关推荐

最近更新

标签

SQL 如何按操作员从 Hive 组中获取元素的数组/包？

提问by Anuroop

回答by Daniel Koverman

回答by ellaqezi

相关推荐

SQL 检查值是否为日期并将其转换

如何使用 SQL 打印星形三角形

SQL HQL Join - 预期加入的路径！休眠

是否可以在 SQL 的 LIKE 语句中执行不区分大小写的搜索？

相关推荐

最近更新

标签