Java Hadoop 上的 HDFS 位置是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19437550/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 17:09:07  来源:igfitidea点击:

What is the HDFS Location on Hadoop?

javahadoop

提问by Nital

I am trying to run the WordCount example in Hadoop after following some online tutorials. However what's not clear to me as where does the file get copied from our local file system to HDFS when we execute the following command.

在学习了一些在线教程后,我尝试在 Hadoop 中运行 WordCount 示例。但是,当我们执行以下命令时,我不清楚文件从本地文件系统复制到 HDFS 的位置。

hadoop fs -copyFromLocal /host/tut/python-tutorial.pdf /usr/local/myhadoop-tmp/

When I executed the following command, I dont see my python-tutorial.pdf listed here on HDFS.

当我执行以下命令时,我没有看到我的 python-tutorial.pdf 在 HDFS 上列出。

hadoop fs -ls

This is confusing me. I have already specified "myhadoop-tmp" directory in core-site.xml. I thought this directory will become HDFS directory for storing all the input files.

这让我很困惑。我已经在 core-site.xml 中指定了“myhadoop-tmp”目录。我认为这个目录将成为存储所有输入文件的 HDFS 目录。

core-site.xml
=============
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/myhadoop-tmp</value>
    <description>A base for other temporary directories.</description>
</property>

If this is not the case where is the HDFS located on my machine ? What configuration determines the HDFS directory and where does the input file go when we copy it from local file system to HDFS ?

如果不是这种情况,我的机器上的 HDFS 在哪里?什么配置决定了 HDFS 目录,当我们将输入文件从本地文件系统复制到 HDFS 时,它会去哪里?

采纳答案by cabad

This is set in the dfs.datanode.data.dirproperty, which defaults to file://${hadoop.tmp.dir}/dfs/data(see details here).

这是在dfs.datanode.data.dir属性中设置的,默认为file://${hadoop.tmp.dir}/dfs/data(请参阅此处的详细信息)。

However, in your case, the problem is that you are not using the full path withinHDFS. Instead, do:

但是,就您而言,问题在于您没有HDFS 中使用完整路径。相反,请执行以下操作:

hadoop fs -ls /usr/local/myhadoop-tmp/

Note that, you also seem to be confusing the path within HDFS to the path in your local file system. Within HDFS, your file is in /usr/local/myhadoop-tmp/. In your local system (and given your configuration setting), it is under /usr/local/myhadoop-tmp/dfs/data/; in there, there's a directory structure and naming convention defined by HDFS, that is independent to whatever path in HDFS you decide to use. Also, it won't have the same name, since it is divided into blocks and each block is assigned a unique ID; the name of a block is something like blk_1073741826.

请注意,您似乎也将 HDFS 中的路径与本地文件系统中的路径混淆了。在 HDFS 中,您的文件位于/usr/local/myhadoop-tmp/. 在您的本地系统中(并根据您的配置设置),它位于/usr/local/myhadoop-tmp/dfs/data/; 在那里,有一个由 HDFS 定义的目录结构和命名约定,它独立于您决定使用的 HDFS 中的任何路径。此外,它不会有相同的名称,因为它被分成多个块,每个块都分配了一个唯一的 ID;块的名称类似于blk_1073741826.

To conclude: the local path used by the datanode is NOT the same as the paths you use in HDFS. You can go into your local directory looking for files, but you should not do this, since you could mess up the HDFS metadata management. Just use the hadoop command-line tools to copy/move/read files within HDFS, using any logical path (in HDFS) that you wish to use. These paths within HDFS do not need to be tied to the paths you used in for your local datanode storage (there is no reason to or advantage of doing this).

总结:datanode 使用的本地路径与您在 HDFS 中使用的路径不同。您可以进入本地目录查找文件,但您不应该这样做,因为您可能会搞乱 HDFS 元数据管理。只需使用 hadoop 命令行工具在 HDFS 中复制/移动/读取文件,使用您希望使用的任何逻辑路径(在 HDFS 中)。HDFS 中的这些路径不需要与您用于本地数据节点存储的路径相关联(这样做没有任何理由或优势)。