SQL hive 中`load data inpath` 和`location` 的区别？

Question

提问by Hunle

At my firm, I see these two commands used frequently, and I'd like to be aware of the differences, because their functionality seems the same to me:

在我的公司，我看到这两个命令经常使用，我想知道它们的区别，因为它们的功能在我看来是一样的：

1

create table <mytable> 
(name string,
number double);

load data inpath '/directory-path/file.csv' into <mytable>;

2

create table <mytable>
(name string,
number double);

location '/directory-path/file.csv';

They both copy the data from the directory on HDFS into the directory for the table on HIVE. Are there differences that one should be aware of when using these? Thank you.

他们都将数据从 HDFS 上的目录复制到 HIVE 上表的目录中。在使用这些时，是否有应该注意的差异？谢谢你。

Answer 1

回答by Sachin Gaikwad

Yes, they are used for different purpose at all.

是的，它们完全用于不同的目的。

load data inpathcommand is use to load data into hive table. 'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.

load data inpath命令用于将数据加载到 hive 表中。'LOCAL' 表示输入文件在本地文件系统上。如果省略“LOCAL”，则它会在 HDFS 中查找文件。

load data inpath '/directory-path/file.csv' into <mytable>; 
load data local inpath '/local-directory-path/file.csv' into <mytable>;

LOCATIONkeyword allow to points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.

LOCATION关键字允许指向其存储的任何 HDFS 位置，而不是存储在配置属性hive.metastore.warehouse.dir指定的文件夹中。

In other words, with specified LOCATION '/your-path/', Hive does not use a default location for this table. This comes in handy if you already have data generated.

换句话说，对于指定的LOCATION '/your-path/'，Hive 不使用此表的默认位置。如果您已经生成了数据，这会派上用场。

Remember, LOCATION can be specify on EXTERNAL tables only. For regular tables, default location will be used.

请记住，LOCATION 只能在 EXTERNAL 表上指定。对于常规表，将使用默认位置。

To summarize, load data inpathtell hive where to look for input files and LOCATIONkeyword tells hive where to save output files on HDFS.

总而言之， 加载数据 inpath告诉 hive 在哪里查找输入文件，而LOCATION关键字告诉 hive 在 HDFS 上保存输出文件的位置。

References: https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

参考资料：https: //cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Answer 2

回答by Ravindra babu

Option 1: Internal table

选项 1：内部表

create table <mytable> 
(name string,
number double);

load data inpath '/directory-path/file.csv' into <mytable>;

This command will remove content at source directory and create a internal table

此命令将删除源目录中的内容并创建一个内部表

Option 2: External table

选项 2：外部表

 create table <mytable>
 (name string,
 number double);

location '/directory-path/file.csv';

Create external table and copy the data into table. Now data won't be moved from source. You can drop external table but still source data is available.

创建外部表并将数据复制到 table 中。现在数据不会从源移动。您可以删除外部表，但源数据仍然可用。

When you drop an external table, it only drops the meta data of HIVE table. Data still exists at HDFS file location.

删除外部表时，只会删除 HIVE 表的元数据。数据仍然存在于 HDFS 文件位置。

Have a look at this related SE questions regarding use cases for both internal and external tables

查看有关内部和外部表用例的相关 SE 问题

Difference between Hive internal tables and external tables?

Hive 内部表和外部表的区别？

SQL hive 中`load data inpath` 和`location` 的区别？

提问by Hunle

1

1

2

2

回答by Sachin Gaikwad

回答by Ravindra babu

相关推荐

最近更新

标签

SQL hive 中`load data inpath` 和`location` 的区别？

提问by Hunle

1

1

2

2

回答by Sachin Gaikwad

回答by Ravindra babu

相关推荐

ANSI SQL 手册

将可为空的列添加到 SQL Server，默认值为 null

使用带有 WHERE 条件的单个查询 (SQL Express 2005) 从多个表中删除行

SQL 如何在sql中连接多个具有相同id的行？

相关推荐

最近更新

标签