如何根据 HiveQL 和 SQL 中特定列的子字符串进行选择?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33113325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:06:00  来源:igfitidea点击:

How to do select based on a substring of a particular column in HiveQL and SQL?

sqlhivesubstringhiveql

提问by DJ_Stuffy_K

I have a table in hive

我在蜂巢里有一张桌子

S.no  Age  minutes  code  
 1     10   20     75081     
 2     11   114    75080      
 3     21   104    75180     
 4     31   124    75108    
 5     10   20     75083     
 6     11   114    75180    
 7     21   104    75180    

I want to write an hivesql/sql query that gives ranked list of total minutes spoken based on the region i.e first 4 digits of code. How should i go about it? I know that SUBSTRING() gives me the required cut, but I have not been able to go from there.

我想编写一个 hivesql/sql 查询,它根据区域(即代码的前 4 位数字)给出总分钟数的排名列表。我应该怎么做?我知道 SUBSTRING() 为我提供了所需的剪辑,但我无法从那里开始。

Select code, minutes as total  
from TableT   
where S.no > 1
group by code 
order by total

Edit: The ranked results based on first 4 digits of zipcode should look something like this

编辑:基于邮政编码的前 4 位数字的排名结果应如下所示

total code

总代码

322(i.e 104+114+104) 7518
154(i.e 20+114+20) 7508
124 7510

322(即104+114+104) 7518
154(即20+114+20) 7508
124 7510

回答by Gordon Linoff

Hmmm. I'm thinking you want this:

嗯。我想你想要这个:

select substr(zip, 1, 4) as region, sum(minutes) as minutes,
       rank() over (order by sum(minutes) desc) as therank
from tableT
where s.no > 1
group by substr(zip, 1, 4)
order by minutes desc;