bash 在 shell 脚本中使用 hive 命令

Question

提问by chhaya vishwakarma

I have a problem where I need to pass columns to a shell script that will be utilized inside another shell script. Then I need to iterate over the columns and do some processing; I want to store the output in a Hive table.

我有一个问题，我需要将列传递给将在另一个 shell 脚本中使用的 shell 脚本。然后我需要遍历列并做一些处理；我想将输出存储在 Hive 表中。

But I don't understand how should I store the output of each iteration in the same Hive table. Can anyone suggest me how it can be done?

但我不明白我应该如何将每次迭代的输出存储在同一个 Hive 表中。谁能建议我怎么做？

#!/bin/bash ./hive -S -e "use telecom;select case when $1/2>avg($1) over (partition by 1) then $1 end from telecom_tr1;"

I am passing here only one column name but, is it possible to pass multiple column names and save the output in a single Hive table?

我在这里只传递一个列名，但是，是否可以传递多个列名并将输出保存在单个 Hive 表中？

Edits:

编辑：

Example output my hive output table should look like this if i give three columns as input to my shell script.Script will iterate over parameters passed Query will do some processing for each iteration and should store result in one hive table only for each iteration(column)

如果我将三列作为我的 shell 脚本的输入，示例输出我的配置单元输出表应该如下所示。脚本将迭代传递的参数 Query 将为每次迭代做一些处理，并且应该只为每次迭代将结果存储在一个配置单元表中（列)

scrip input: sh test.sh col1 col2 col3

expected output: iteration one | iteration two | iteration three

Answer 1

采纳答案by Jordan Young

You can create partitioned tables in Hive which will easily solve this problem. The basic format would look something like this:

您可以在 Hive 中创建分区表，这将轻松解决此问题。基本格式如下所示：

create table my_table (field string) partitioned by (iter int);

INSERT OVERWRITE TABLE my_table partition (iter=${iter})
select case when /2>avg() over (partition by 1) then  end from telecom_tr1;

This will return the results from each run of the query appended to each other with a column called iter specifying which iteration the result comes from.

这将返回每次查询运行的结果，并附加一个名为 iter 的列，指定结果来自哪个迭代。

Answer 2

回答by dpsdce

following is a crude way of doing it

以下是一种粗略的做法

--myQuery.hql

  use telecom;

  Create table my_temp_table_${iterationNo} as
  select my_temp_table_${old_iterationNo}.* ,(select case when $iterationNo/2>avg($iterationNo) over (partition by 1) then  end from telecom_tr1;) as Iteration_2 
  from my_temp_table_${old_iterationNo};

  drop table my_temp_table_${old_iterationNo};

in Bash just iterate over the the params which you got call the HQL as

在 Bash 中，只需迭代您称为 HQL 的参数即可

  hive -d iterationNo = $current -d old_iterationNo=$prev -f myQuery.hql

bash 在 shell 脚本中使用 hive 命令

提问by chhaya vishwakarma

采纳答案by Jordan Young

回答by dpsdce

相关推荐

最近更新

标签

bash 在 shell 脚本中使用 hive 命令

提问by chhaya vishwakarma

采纳答案by Jordan Young

回答by dpsdce

相关推荐

bash: /bin/tar: 使用 tar 压缩许多文件时参数列表太长

bash 使用管道将输入传递到 passwd

bash readarray 的替代方案，因为它在 mac os x 上不起作用

bash 使用 sed 匹配包含换行符的字符串

相关推荐

最近更新

标签