使用 Sqoop 将数据从 MySQL 导入 Hive
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22404641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Sqoop to import data from MySQL to Hive
提问by Nayan
I am using Sqoop (version 1.4.4) to import data from MySQL to Hive. The data will be a subset of one of tables, i.e. few columns from a table. Is it necessary to create table in Hive before hand. Or importing the data will create the name specified in the command if it is not in the Hive?
我正在使用 Sqoop(版本 1.4.4)将数据从 MySQL 导入 Hive。数据将是表之一的子集,即表中的几列。是否有必要事先在 Hive 中创建表。或者导入数据会创建命令中指定的名称,如果它不在 Hive 中?
回答by Manu Eidenberger
As mentioned in the sqoop documentation, you will not have to create any hive tables if you use the --hive-importargument in your command
如sqoop 文档中所述,如果在命令中使用--hive-import参数,则不必创建任何配置单元表
example:
例子:
sqoop import --connect jdbc:mysql://mysql_server:3306/db_name --username mysql_user --password mysql_pass --table table_name --hive-import
Also... consider the --hive-overwriteargument if you want to schedule a full data import, on a daily base for example
另外......如果你想安排一个完整的数据导入,例如在每天的基础上,考虑--hive-overwrite参数
回答by Nayan
I finally resolved the issue. It would involve two steps.
我终于解决了这个问题。这将涉及两个步骤。
- Create an external hive table.
- Import data using Sqoop.
- 创建外部配置单元表。
- 使用 Sqoop 导入数据。
Creation of External table : External tables in hive are kind of permanent tables and stays there even if hive is stopped or server goes down. "EXTERNAL" keyword is used to specify table type.
外部表的创建:配置单元中的外部表是一种永久表,即使配置单元停止或服务器关闭也会保持在那里。“EXTERNAL”关键字用于指定表类型。
CREATE EXTERNAL TABLE IF NOT EXISTS HIVEDB.HIVE_TABLE1 (DATE_COL DATE,
BIG_INT_COL BIGINT, INT_COL INT, VARCHAR_COL VARCHAR(221), FLOAT_COL FLOAT);
Import the data using Sqoop : Specify the created table name while importing the data, instead of using "--hive-create" option.
使用 Sqoop 导入数据:在导入数据时指定创建的表名,而不是使用“--hive-create”选项。
sqoop import --connect jdbc:mysql://mysqlhost/mysqldb --username user --password
passwd --query "SELECT table1.date_col, table1.big_int_col, table1.int_col,
table1.varchar_col, table1.float_col FROM MYSQL_TABLE1 AS table1 WHERE
$CONDITIONS" --split-by table1.date_col --hive-import
--hive-table hivedb.hive_table1 --target-dir hive_table1_data`
Data was stored permanently in Hive.
数据永久存储在 Hive 中。
回答by Subhashini
Even if there is no table in hive, sqoop import
will create it. The following worked for me :
即使 hive 中没有表,sqoop import
也会创建它。以下对我有用:
sqoop import --connect jdbc:mysql://localhost/<<dbname>> --username <<YourMySqlUsername>> --password <<YourMySqlpwd>> --table employee --hive-import --hive-table employee_1 -m -1
回答by Dr.Rizz
Nayan, you probably would have figured it out by now.
Nayan,你现在可能已经想通了。
Whether EXTERNAL or not, hive tables are stored on HDFS.
无论是否为 EXTERNAL,hive 表都存储在 HDFS 上。
The keyword EXTERNAL only loosely ties the table with its data. For example, deleting the EXTERNAL table from within Hive only deletes the schema and leaves the data untouched on HDFS.
关键字 EXTERNAL 只是将表与其数据松散地联系在一起。例如,从 Hive 中删除 EXTERNAL 表只会删除架构,而不会影响 HDFS 上的数据。
回答by Somum
Firstly, one doesn't have to create an EXTERNAL table it works otherwise, secondly, the solutions given above are bit complex.
首先,一个人不必创建一个 EXTERNAL 表,否则它可以工作,其次,上面给出的解决方案有点复杂。
Suppose mysql schema looks like this
假设 mysql 架构看起来像这样
mysql> describe emp;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| deg | varchar(20) | YES | | NULL | |
| salary | int(11) | YES | | NULL | |
| dept | varchar(20) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
Then one needs to create hive table as I did, DATABASE as userdb and TABLE as emp
然后需要像我一样创建 hive 表,DATABASE 为 userdb,TABLE 为 emp
hive>
CREATE TABLE userdb.emp (
id INT,
name VARCHAR(20),
deg VARCHAR(20),
salary INT,
dept VARCHAR(20))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
Now it is a matter of running the sqoop script ( I had to quit from hive prompt though ) and since I am not using hive2 I had to run the below script at the location where metastore_db exist ( ie from the same working directory where I used hive). Some workaround can mitigate this problem (I guess). The sqoop script is
现在是运行 sqoop 脚本的问题(尽管我不得不从 hive 提示退出)并且由于我没有使用 hive2,因此我必须在 metastore_db 存在的位置运行以下脚本(即从我使用的同一工作目录中)蜂巢)。一些解决方法可以缓解这个问题(我猜)。sqoop脚本是
sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root --password root \
--table emp --fields-terminated-by ',' \
--split-by id \
--hive-import --hive-table userdb.emp \
--target-dir /emp
The target directory ie /emp gets deleted once the command succeeds. I explicitly specified the hive table using userdb.emp
一旦命令成功,目标目录即 /emp 将被删除。我使用 userdb.emp 显式指定了 hive 表
My hdfs directory structure
我的hdfs目录结构
drwxr-xr-x - ubuntu supergroup 0 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp
-rwxr-xr-x 3 ubuntu supergroup 28 2016-12-18 13:19 /user/hive/warehouse/userdb.db/emp/part-m-00000
-rwxr-xr-x 3 ubuntu supergroup 35 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00001
-rwxr-xr-x 3 ubuntu supergroup 29 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00002
-rwxr-xr-x 3 ubuntu supergroup 31 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00003
-rwxr-xr-x 3 ubuntu supergroup 28 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00004
回答by Sravan Kumar
No need to create a table. Its not necessary. While we are importing itself we can do it. Please look the below command.
无需创建表。这不是必需的。当我们自己导入时,我们可以做到。请看下面的命令。
sqoop import --connect jdbc:mysql://mysql_server:3306/db_name \
--username mysql_user \
--password mysql_pass \
--table table_name \
--hive-import