使用 Sqoop 将数据从 MySQL 导入 Hive

Question

提问by Nayan

I am using Sqoop (version 1.4.4) to import data from MySQL to Hive. The data will be a subset of one of tables, i.e. few columns from a table. Is it necessary to create table in Hive before hand. Or importing the data will create the name specified in the command if it is not in the Hive?

我正在使用 Sqoop（版本 1.4.4）将数据从 MySQL 导入 Hive。数据将是表之一的子集，即表中的几列。是否有必要事先在 Hive 中创建表。或者导入数据会创建命令中指定的名称，如果它不在 Hive 中？

Answer 1

回答by Manu Eidenberger

As mentioned in the sqoop documentation, you will not have to create any hive tables if you use the --hive-importargument in your command

如sqoop 文档中所述，如果在命令中使用--hive-import参数，则不必创建任何配置单元表

example:

例子：

sqoop import --connect jdbc:mysql://mysql_server:3306/db_name --username mysql_user --password mysql_pass --table table_name --hive-import

Also... consider the --hive-overwriteargument if you want to schedule a full data import, on a daily base for example

另外......如果你想安排一个完整的数据导入，例如在每天的基础上，考虑--hive-overwrite参数

Answer 2

回答by Nayan

I finally resolved the issue. It would involve two steps.

我终于解决了这个问题。这将涉及两个步骤。

Create an external hive table.
Import data using Sqoop.

创建外部配置单元表。
使用 Sqoop 导入数据。

Creation of External table : External tables in hive are kind of permanent tables and stays there even if hive is stopped or server goes down. "EXTERNAL" keyword is used to specify table type.

外部表的创建：配置单元中的外部表是一种永久表，即使配置单元停止或服务器关闭也会保持在那里。“EXTERNAL”关键字用于指定表类型。

CREATE EXTERNAL TABLE IF NOT EXISTS HIVEDB.HIVE_TABLE1 (DATE_COL DATE, 
BIG_INT_COL BIGINT, INT_COL INT, VARCHAR_COL VARCHAR(221), FLOAT_COL FLOAT);

Import the data using Sqoop : Specify the created table name while importing the data, instead of using "--hive-create" option.

使用 Sqoop 导入数据：在导入数据时指定创建的表名，而不是使用“--hive-create”选项。

sqoop import --connect jdbc:mysql://mysqlhost/mysqldb --username user --password 
passwd --query "SELECT table1.date_col, table1.big_int_col, table1.int_col, 
table1.varchar_col, table1.float_col FROM MYSQL_TABLE1 AS table1 WHERE 
$CONDITIONS" --split-by table1.date_col --hive-import 
--hive-table hivedb.hive_table1 --target-dir hive_table1_data`

Data was stored permanently in Hive.

数据永久存储在 Hive 中。

Answer 3

回答by Subhashini

Even if there is no table in hive, sqoop importwill create it. The following worked for me :

即使 hive 中没有表，sqoop import也会创建它。以下对我有用：

sqoop import --connect jdbc:mysql://localhost/<<dbname>> --username <<YourMySqlUsername>> --password <<YourMySqlpwd>> --table employee --hive-import --hive-table employee_1 -m -1

Answer 4

回答by Dr.Rizz

Nayan, you probably would have figured it out by now.

Nayan，你现在可能已经想通了。

Whether EXTERNAL or not, hive tables are stored on HDFS.

无论是否为 EXTERNAL，hive 表都存储在 HDFS 上。

The keyword EXTERNAL only loosely ties the table with its data. For example, deleting the EXTERNAL table from within Hive only deletes the schema and leaves the data untouched on HDFS.

关键字 EXTERNAL 只是将表与其数据松散地联系在一起。例如，从 Hive 中删除 EXTERNAL 表只会删除架构，而不会影响 HDFS 上的数据。

Answer 5

回答by Somum

Firstly, one doesn't have to create an EXTERNAL table it works otherwise, secondly, the solutions given above are bit complex.

首先，一个人不必创建一个 EXTERNAL 表，否则它可以工作，其次，上面给出的解决方案有点复杂。

Suppose mysql schema looks like this

假设 mysql 架构看起来像这样

mysql> describe emp;
+--------+-------------+------+-----+---------+-------+
| Field  | Type        | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id     | int(11)     | YES  |     | NULL    |       |
| name   | varchar(20) | YES  |     | NULL    |       |
| deg    | varchar(20) | YES  |     | NULL    |       |
| salary | int(11)     | YES  |     | NULL    |       |
| dept   | varchar(20) | YES  |     | NULL    |       |
+--------+-------------+------+-----+---------+-------+

Then one needs to create hive table as I did, DATABASE as userdb and TABLE as emp

然后需要像我一样创建 hive 表，DATABASE 为 userdb，TABLE 为 emp

hive>
CREATE TABLE userdb.emp (
id  INT,
name  VARCHAR(20),
deg  VARCHAR(20),
salary INT,
dept  VARCHAR(20))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

Now it is a matter of running the sqoop script ( I had to quit from hive prompt though ) and since I am not using hive2 I had to run the below script at the location where metastore_db exist ( ie from the same working directory where I used hive). Some workaround can mitigate this problem (I guess). The sqoop script is

现在是运行 sqoop 脚本的问题（尽管我不得不从 hive 提示退出）并且由于我没有使用 hive2，因此我必须在 metastore_db 存在的位置运行以下脚本（即从我使用的同一工作目录中）蜂巢）。一些解决方法可以缓解这个问题（我猜）。sqoop脚本是

sqoop import \ 
--connect jdbc:mysql://localhost/userdb \
--username root --password root \ 
--table emp --fields-terminated-by ',' \ 
--split-by id \ 
--hive-import --hive-table userdb.emp \
--target-dir /emp

The target directory ie /emp gets deleted once the command succeeds. I explicitly specified the hive table using userdb.emp

一旦命令成功，目标目录即 /emp 将被删除。我使用 userdb.emp 显式指定了 hive 表

My hdfs directory structure

我的hdfs目录结构

drwxr-xr-x   - ubuntu supergroup          0 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp
-rwxr-xr-x   3 ubuntu supergroup         28 2016-12-18 13:19 /user/hive/warehouse/userdb.db/emp/part-m-00000
-rwxr-xr-x   3 ubuntu supergroup         35 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00001
-rwxr-xr-x   3 ubuntu supergroup         29 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00002
-rwxr-xr-x   3 ubuntu supergroup         31 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00003
-rwxr-xr-x   3 ubuntu supergroup         28 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00004

Answer 6

回答by Sravan Kumar

No need to create a table. Its not necessary. While we are importing itself we can do it. Please look the below command.

无需创建表。这不是必需的。当我们自己导入时，我们可以做到。请看下面的命令。

sqoop import --connect jdbc:mysql://mysql_server:3306/db_name \
--username mysql_user \
--password mysql_pass \
--table table_name \
--hive-import

使用 Sqoop 将数据从 MySQL 导入 Hive

提问by Nayan

回答by Manu Eidenberger

回答by Nayan

回答by Subhashini

回答by Dr.Rizz

回答by Somum

回答by Sravan Kumar

相关推荐

最近更新

标签

使用 Sqoop 将数据从 MySQL 导入 Hive

提问by Nayan

回答by Manu Eidenberger

回答by Nayan

回答by Subhashini

回答by Dr.Rizz

回答by Somum

回答by Sravan Kumar

相关推荐

MySQL 在mysql中检索客户端IP地址

MySQL mysqldump：写入时出现 errno 32

使用 MySQL 将 Unix 时间戳转换为人类可读的日期

如何增加 MySQL 连接数（max_connections）？

相关推荐

最近更新

标签