SQL hive 中`load data inpath` 和`location` 的区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35473933/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between `load data inpath ` and `location` in hive?
提问by Hunle
At my firm, I see these two commands used frequently, and I'd like to be aware of the differences, because their functionality seems the same to me:
在我的公司,我看到这两个命令经常使用,我想知道它们的区别,因为它们的功能在我看来是一样的:
1
1
create table <mytable>
(name string,
number double);
load data inpath '/directory-path/file.csv' into <mytable>;
2
2
create table <mytable>
(name string,
number double);
location '/directory-path/file.csv';
They both copy the data from the directory on HDFS into the directory for the table on HIVE. Are there differences that one should be aware of when using these? Thank you.
他们都将数据从 HDFS 上的目录复制到 HIVE 上表的目录中。在使用这些时,是否有应该注意的差异?谢谢你。
回答by Sachin Gaikwad
Yes, they are used for different purpose at all.
是的,它们完全用于不同的目的。
load data inpathcommand is use to load data into hive table. 'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.
load data inpath命令用于将数据加载到 hive 表中。'LOCAL' 表示输入文件在本地文件系统上。如果省略“LOCAL”,则它会在 HDFS 中查找文件。
load data inpath '/directory-path/file.csv' into <mytable>;
load data local inpath '/local-directory-path/file.csv' into <mytable>;
LOCATIONkeyword allow to points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.
LOCATION关键字允许指向其存储的任何 HDFS 位置,而不是存储在配置属性hive.metastore.warehouse.dir指定的文件夹中。
In other words, with specified LOCATION '/your-path/'
, Hive does not use a default location for this table. This comes in handy if you already have data generated.
换句话说,对于指定的LOCATION '/your-path/'
,Hive 不使用此表的默认位置。如果您已经生成了数据,这会派上用场。
Remember, LOCATION can be specify on EXTERNAL tables only. For regular tables, default location will be used.
请记住,LOCATION 只能在 EXTERNAL 表上指定。对于常规表,将使用默认位置。
To summarize, load data inpathtell hive where to look for input files and LOCATIONkeyword tells hive where to save output files on HDFS.
总而言之, 加载数据 inpath告诉 hive 在哪里查找输入文件,而LOCATION关键字告诉 hive 在 HDFS 上保存输出文件的位置。
References: https://cwiki.apache.org/confluence/display/Hive/GettingStartedhttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
参考资料:https: //cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
回答by Ravindra babu
Option 1: Internal table
选项 1:内部表
create table <mytable>
(name string,
number double);
load data inpath '/directory-path/file.csv' into <mytable>;
This command will remove content at source directory and create a internal table
此命令将删除源目录中的内容并创建一个内部表
Option 2: External table
选项 2:外部表
create table <mytable>
(name string,
number double);
location '/directory-path/file.csv';
Create external table and copy the data into table. Now data won't be moved from source. You can drop external table but still source data is available.
创建外部表并将数据复制到 table 中。现在数据不会从源移动。您可以删除外部表,但源数据仍然可用。
When you drop an external table, it only drops the meta data of HIVE table. Data still exists at HDFS file location.
删除外部表时,只会删除 HIVE 表的元数据。数据仍然存在于 HDFS 文件位置。
Have a look at this related SE questions regarding use cases for both internal and external tables
查看有关内部和外部表用例的相关 SE 问题
Difference between Hive internal tables and external tables?