如何根据 HiveQL 和 SQL 中特定列的子字符串进行选择?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33113325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to do select based on a substring of a particular column in HiveQL and SQL?
提问by DJ_Stuffy_K
I have a table in hive
我在蜂巢里有一张桌子
S.no Age minutes code
1 10 20 75081
2 11 114 75080
3 21 104 75180
4 31 124 75108
5 10 20 75083
6 11 114 75180
7 21 104 75180
I want to write an hivesql/sql query that gives ranked list of total minutes spoken based on the region i.e first 4 digits of code. How should i go about it? I know that SUBSTRING() gives me the required cut, but I have not been able to go from there.
我想编写一个 hivesql/sql 查询,它根据区域(即代码的前 4 位数字)给出总分钟数的排名列表。我应该怎么做?我知道 SUBSTRING() 为我提供了所需的剪辑,但我无法从那里开始。
Select code, minutes as total
from TableT
where S.no > 1
group by code
order by total
Edit: The ranked results based on first 4 digits of zipcode should look something like this
编辑:基于邮政编码的前 4 位数字的排名结果应如下所示
total code
总代码
322(i.e 104+114+104) 7518
154(i.e 20+114+20) 7508
124 7510
322(即104+114+104) 7518
154(即20+114+20) 7508
124 7510
回答by Gordon Linoff
Hmmm. I'm thinking you want this:
嗯。我想你想要这个:
select substr(zip, 1, 4) as region, sum(minutes) as minutes,
rank() over (order by sum(minutes) desc) as therank
from tableT
where s.no > 1
group by substr(zip, 1, 4)
order by minutes desc;