Java 即使 Hadoop 正在运行,它也没有在作业跟踪器中显示我的作业

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21345022/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:10:11  来源:igfitidea点击:

Hadoop is not showing my job in the job tracker even though it is running

javahadoophadoop-streamingyarn

提问by Chris Hinshaw

Problem:When I submit a job to my hadoop 2.2.0 cluster it doesn't show up in the job tracker but the job completes successfully.By this I can see the output and it is running correctly and prints output as it is running.

问题:当我向我的 hadoop 2.2.0 集群提交作业时,它没有显示在作业跟踪器中,但作业成功完成。通过这个,我可以看到输出并且它正在正确运行并在运行时打印输出。

I have tried muliple options but the job tracker is not seeing the job. If I run a streaming job using the 2.2.0 hadoop it shows up in the task tracker but when I submit it via the hadoop-client api it does not show up in the job tracker. I am looking at the ui interface on port 8088 to verify the job

我尝试了多个选项,但工作跟踪器没有看到工作。如果我使用 2.2.0 hadoop 运行流作业,它会显示在任务跟踪器中,但是当我通过 hadoop-client api 提交它时,它不会显示在作业跟踪器中。我正在查看 8088 端口上的 ui 界面以验证作业

EnvironmentOSX Mavericks, Java 1.6, Hadoop 2.2.0 single node cluster, Tomcat 7.0.47

环境OSX Mavericks、Java 1.6、Hadoop 2.2.0 单节点集群、Tomcat 7.0.47

Code

代码

    try {
        configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
        configuration.set("mapred.jobtracker.address", "localhost:9001");

        Job job = createJob(configuration);
        job.waitForCompletion(true);
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Unable to execute job", e);
    }

    return null;

etc/hadoop/mapred-site.xml

etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
    </property> 
</configuration>

etc/hadoop/core-site.xml

etc/hadoop/core-site.xml

<configuration>
     <property>
       <name>hadoop.tmp.dir</name>
       <value>/tmp/hadoop-${user.name}</value>
       <description>A base for other temporary directories.</description>
    </property>

    <property> 
      <name>fs.default.name</name> 
      <value>hdfs://localhost:9000</value> 
    </property>

</configuration>

采纳答案by Chris Hinshaw

The resolution to the issue was to configure the job with the extra configuration options for yarn. I made int incorrect assumption that the java hadoop-client api would use the configuration options from the configuration directory. I was able to diagnose the problem by turning on verbose logging using log4j.properties for my unit tests. It showed that the jobs were running local and not being submitted to the yarn resource manager. With a little bit of trial and error I was able to configure the job and have it submitted to the yarn resource manager.

该问题的解决方案是使用纱线的额外配置选项配置作业。我错误地假设 java hadoop-client api 将使用配置目录中的配置选项。我能够通过使用 log4j.properties 为我的单元测试打开详细日志记录来诊断问题。它表明作业在本地运行,没有提交给纱线资源管理器。通过一点点反复试验,我能够配置作业并将其提交给纱线资源管理器。

Code

代码

    try {
        configuration.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
        configuration.set("mapreduce.jobtracker.address", "localhost:54311");
        configuration.set("mapreduce.framework.name", "yarn");
        configuration.set("yarn.resourcemanager.address", "localhost:8032");

        Job job = createJob(configuration);
        job.waitForCompletion(true);
    } catch (Exception e) {
        logger.log(Level.SEVERE, "Unable to execute job", e);
    }

回答by Rohit Menon

I see that you are using Hadoop 2.2.0. Are you using MRv1 or MRv2? The daemons are different for MRv2 (YARN). There is no JobTracker for MRv2, though you may see a placeholder page for the JobTracker UI.

我看到您正在使用 Hadoop 2.2.0。您使用的是 MRv1 还是 MRv2?MRv2 (YARN) 的守护进程不同。MRv2 没有 JobTracker,但您可能会看到 JobTracker UI 的占位符页面。

The ResourceManager web UI should display your submitted jobs. The default web URL for the ResourceManager is http://<ResourcemanagerHost>:8088

ResourceManager Web UI 应显示您提交的作业。ResourceManager 的默认 Web URL 是 http://< ResourcemanagerHost>:8088

Replace ResourceManagerHost with the IP address of the node where the Resource Manager is running.

将 ResourceManagerHost 替换为运行资源管理器的节点的 IP 地址。

You can read more about the YARN architecture at Apache Hadoop YARN

您可以在Apache Hadoop YARN 上阅读有关 YARN 架构的更多信息