bash 在 shell 脚本中使用 hive 命令
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24035316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using hive commands in shell script
提问by chhaya vishwakarma
I have a problem where I need to pass columns to a shell script that will be utilized inside another shell script. Then I need to iterate over the columns and do some processing; I want to store the output in a Hive table.
我有一个问题,我需要将列传递给将在另一个 shell 脚本中使用的 shell 脚本。然后我需要遍历列并做一些处理;我想将输出存储在 Hive 表中。
But I don't understand how should I store the output of each iteration in the same Hive table. Can anyone suggest me how it can be done?
但我不明白我应该如何将每次迭代的输出存储在同一个 Hive 表中。谁能建议我怎么做?
#!/bin/bash
./hive -S -e "use telecom;select case when $1/2>avg($1) over (partition by 1) then $1 end from telecom_tr1;"
#!/bin/bash
./hive -S -e "use telecom;select case when $1/2>avg($1) over (partition by 1) then $1 end from telecom_tr1;"
I am passing here only one column name but, is it possible to pass multiple column names and save the output in a single Hive table?
我在这里只传递一个列名,但是,是否可以传递多个列名并将输出保存在单个 Hive 表中?
Edits:
编辑:
Example output my hive output table should look like this if i give three columns as input to my shell script.Script will iterate over parameters passed Query will do some processing for each iteration and should store result in one hive table only for each iteration(column)
如果我将三列作为我的 shell 脚本的输入,示例输出我的配置单元输出表应该如下所示。脚本将迭代传递的参数 Query 将为每次迭代做一些处理,并且应该只为每次迭代将结果存储在一个配置单元表中(列)
scrip input: sh test.sh col1 col2 col3
expected output: iteration one | iteration two | iteration three
采纳答案by Jordan Young
You can create partitioned tables in Hive which will easily solve this problem. The basic format would look something like this:
您可以在 Hive 中创建分区表,这将轻松解决此问题。基本格式如下所示:
create table my_table (field string) partitioned by (iter int);
INSERT OVERWRITE TABLE my_table partition (iter=${iter})
select case when /2>avg() over (partition by 1) then end from telecom_tr1;
This will return the results from each run of the query appended to each other with a column called iter specifying which iteration the result comes from.
这将返回每次查询运行的结果,并附加一个名为 iter 的列,指定结果来自哪个迭代。
回答by dpsdce
following is a crude way of doing it
以下是一种粗略的做法
--myQuery.hql
--myQuery.hql
use telecom;
Create table my_temp_table_${iterationNo} as
select my_temp_table_${old_iterationNo}.* ,(select case when $iterationNo/2>avg($iterationNo) over (partition by 1) then end from telecom_tr1;) as Iteration_2
from my_temp_table_${old_iterationNo};
drop table my_temp_table_${old_iterationNo};
in Bash just iterate over the the params which you got call the HQL as
在 Bash 中,只需迭代您称为 HQL 的参数即可
hive -d iterationNo = $current -d old_iterationNo=$prev -f myQuery.hql