SQL Hive LEFT SEMI JOIN 表示“不存在”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25041026/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Hive LEFT SEMI JOIN for 'NOT EXISTS'
提问by mel
I have two tables with a single key column. Keys in table a are subset of all keys in table b. I need to select keys from table b that are NOT in table a.
我有两个表,只有一个键列。表 a 中的键是表 b 中所有键的子集。我需要从表 b 中选择不在表 a 中的键。
Here is a citation from Hive manual: "LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way. As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using subqueries so most of these JOINs don't have to be performed manually anymore. The restrictions of using LEFT SEMI JOIN is that the right-hand-side table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses etc."
这是 Hive 手册中的引用:“LEFT SEMI JOIN 以一种有效的方式实现了不相关的 IN/EXISTS 子查询语义。从 Hive 0.13 开始,使用子查询支持 IN/NOT IN/EXISTS/NOT EXISTS 运算符,因此大多数这些 JOIN 不支持'不必再手动执行了。使用 LEFT SEMI JOIN 的限制是右侧表只能在连接条件(ON-子句)中引用,而不能在 WHERE- 或 SELECT-子句等中引用。”
They use this example for illustration:
他们使用这个例子来说明:
SELECT a.key, a.value FROM a WHERE a.key IN (SELECT b.key FROM B);
Is equivalent to
相当于
SELECT a.key, a.val FROM a LEFT SEMI JOIN b ON (a.key = b.key);
However, what I need to do is first example with 'NOT IN;. Unfortunately this syntax is not supported in Hive 0.13. It's for illustration only:
但是,我需要做的是第一个带有“NOT IN;”的示例。不幸的是,Hive 0.13 不支持此语法。它仅用于说明:
SELECT a.key, a.value FROM a WHERE a.key NOT IN (SELECT b.key FROM B);
I searched this site for recommendations, and saw this example:
我在这个网站上搜索了推荐,看到了这个例子:
SELECT a.key FROM a LEFT OUTER JOIN b ON a.key = b.key WHERE b.key IS NULL;
It does not work as expected. When I join a.key NOT in b and a.key IN b, I don't get the original a this way. Maybe that is because this query cannot do the trick, note bold text - b.key should not appear in WHERE.
它没有按预期工作。当我加入 a.key NOT in b 和 a.key IN b 时,我不会以这种方式获得原始 a。也许这是因为此查询无法解决问题,请注意粗体文本 - b.key 不应出现在 WHERE 中。
What should I do then? Any other trick? Thanks!
那我该怎么办?还有什么技巧吗?谢谢!
P.S. I cannot share any real data; it's a pretty simple example, where keys in a are all included in b and a is a subset of b.
PS 我不能分享任何真实数据;这是一个非常简单的示例,其中 a 中的键都包含在 b 中,而 a 是 b 的子集。
回答by okkar
If you want results from table b, perhaps you can do the following instead?
如果您想要表 b 的结果,也许您可以执行以下操作?
SELECT b.key FROM b LEFT OUTER JOIN a ON b.key = a.key WHERE a.key IS NULL;
回答by Gyanendra Dwivedi
The answer to your issue should be
你的问题的答案应该是
SELECT a.key FROM a LEFT OUTER JOIN b ON a.key = b.key WHERE b.key IS NULL;
This means, bring all the keys from a, irrespective of whether there is a match in b or not. The where cause will filter those records, which are not available in b.
这意味着,无论 b 中是否存在匹配项,都从 a 中获取所有密钥。where 原因将过滤那些在 b 中不可用的记录。
回答by user3512680
Or you can try
或者你可以试试
SELECT a.key FROM a LEFT ANTI JOIN b ON a.key = b.key
回答by j pavan kumar
I tried left semi join for IN function in cdh 5.7.0 with spark 1.6 version.
我在 cdh 5.7.0 和 spark 1.6 版本中尝试使用 left semi join for IN 函数。
The semi left join gives wrong results, which is not similar to IN functionin sub queries.
半左连接给出了错误的结果,这与子查询中的IN 函数不同。