Hive SQL 中的嵌套查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13631259/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 12:22:20  来源:igfitidea点击:

Nested queries in Hive SQL

sqlhive

提问by S4M

I have a database, and I use a query to produce an intermediate table like this:

我有一个数据库,我使用查询来生成这样的中间表:

id    a      b    
xx    1      2    
yy    7      11    

and I would like to calculate the standard deviations of b for the users who have a < avg(a)

我想为具有 < avg(a) 的用户计算 b 的标准偏差

I calculate avg(a) that way and it works fine:

我以这种方式计算 avg(a) 并且它工作正常:

select avg(select a from (query to produce intermediate table)) from table;

But the query:

但是查询:

select stddev_pop(b) 
from (query to produce intermediate table) 
where a < (select avg(select a 
                     from (query to produce intermediate table))
          from table);

Returns me an error, and more precisely, I am told that the "a" from avg(select a from...) is not recognised. This makes me really confused, as it works in the previous query.

返回一个错误,更准确地说,我被告知无法识别来自 avg(select a from...) 的“a”。这让我真的很困惑,因为它适用于上一个查询。

I would be grateful if somebody could help.

如果有人可以提供帮助,我将不胜感激。

EDIT:

编辑:

I stored the result of my query to generate the intermediary table into a temporary table, but still run into the same problem. The non working query becomes:

我将查询的结果存储到临时表中以生成中间表,但仍然遇到相同的问题。非工作查询变为:

select stddev_pop(b) from temp where a < (select avg(a) from temp);

while this works:

虽然这有效:

select avg(a) from temp;

回答by S4M

OK, a colleague helped me to do it. I'll post the answer in case someone runs into the same problem:

好的,同事帮我做的。如果有人遇到同样的问题,我会发布答案:

select stddev_pop(b)
from temp x
join (select avg(a) as average from temp) y
where x.a < y.average;

Basically hive doesn't do caching of a table as a variable.

基本上 hive 不会将表缓存为变量。

回答by BellevueBob

You likely need to move your parentheses in your WHEREclause. Try this:

您可能需要移动WHERE子句中的括号。尝试这个:

select stddev_pop(b) 
from (query to produce intermediate table) 
where c < ( select avg(a) 
            from (query to produce intermediate table)
          );

And, your question refers to a column c; did you mean a?

而且,您的问题涉及一列c;你的意思是a

UPDATE: I saw a similar questionwith MySQLtoday; sorry I don't know Hive. See if this works:

更新:我看到了一个类似的问题MySQL今天; 抱歉我不知道Hive。看看这是否有效:

select stddev_pop(b) 
from   temp 
where  a < ( select *
             from (select avg(a) from temp) x
          );

回答by Saurabh Saxena

ok , first of all hive doesnt support sub queries anywhere only than the from clause. so you can't use subquery in where clause you have to create a temp table in from clause and you can use that table. Now if you create a temp table and than you are using it in your where clause than to refer that temp table it has to again run the fetching query so again it will not support .

好的,首先,除了 from 子句之外,hive 不支持任何地方的子查询。所以你不能在 where 子句中使用子查询,你必须在 from 子句中创建一个临时表,你可以使用该表。现在,如果您创建一个临时表,然后在 where 子句中使用它而不是引用该临时表,则它必须再次运行获取查询,因此它将不再支持 .

Bob I think hive will not support this select stddev_pop(b) from temp where a < ( select * from (select avg(a) from temp) x );

Bob 我认为 hive 不会支持这个 select stddev_pop(b) from temp where a < ( select * from (select avg(a) from temp) x );

but yes select stddev_pop(b) from temp x join (select avg(a) as average from temp) y where x.a < y.average;

但是是的,从 temp x join 中选择 stddev_pop(b)(从 temp 中选择 avg(a) 作为平均值)y where xa < y.average;

if we can create a temp table physically and put the data select avg(a) as average from temp into that then we can refer this .

如果我们可以在物理上创建一个临时表并将数据 select avg(a) as average from temp 放入其中,那么我们可以参考这个。