java 登录 MapReduce 作业的标准实践

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28119423/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 13:01:50  来源:igfitidea点击:

Standard practices for logging in MapReduce jobs

javahadoopmapreducehadoop2mapr

提问by Frank

I'm trying to find the best approach for logging in MapReduce jobs. I'm using slf4j with log4j appender as in my other Java applications, but since MapReduce job runs in a distributed manner across the cluster I don't know where should I set the log file location, since it is a shared cluster with limited access privileges.

我正在尝试寻找登录 MapReduce 作业的最佳方法。我在其他 Java 应用程序中使用带有 log4j appender 的 slf4j,但是由于 MapReduce 作业在集群中以分布式方式运行,我不知道应该在哪里设置日志文件位置,因为它是访问受限的共享集群特权。

Is there any standard practices for logging in MapReduce jobs, so you can easily be able to look at the logs across the cluster after the job completes?

是否有任何记录 MapReduce 作业的标准做法,以便您可以在作业完成后轻松查看整个集群的日志?

回答by Ashrith

You could use log4j which is the default logging framework that hadoop uses. So, from your MapReduce application you could do something like this:

您可以使用 log4j,它是 hadoop 使用的默认日志记录框架。因此,从您的 MapReduce 应用程序中,您可以执行以下操作:

import org.apache.log4j.Logger;
// other imports omitted

public class SampleMapper extends Mapper<LongWritable, Text, Text, Text> {
    private Logger logger = Logger.getLogger(SampleMapper.class);

    @Override
    protected void setup(Context context) {
        logger.info("Initializing NoSQL Connection.")
        try {
            // logic for connecting to NoSQL - ommitted
        } catch (Exception ex) {
            logger.error(ex.getMessage());
        }
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // mapper code ommitted
    }
}        

This sample code will user log4j logger to log events to the inherited Mapper logger. All the log events will be logged to their respective task log's. You could visit the task logs from either JobTracker(MRv1)/ResourceManager(MRv2) webpage.

此示例代码将使用 log4j 记录器将事件记录到继承的 Mapper 记录器。所有日志事件都将记录到各自的任务日志中。您可以从 JobTracker(MRv1)/ResourceManager(MRv2) 网页访问任务日志。

If you are using yarnyou could access the application logs from command line using the following command:

如果您使用纱线,则可以使用以下命令从命令行访问应用程序日志:

yarn logs -applicationId <application_id>

While if you are using mapreduce v1, there is no single point of access from command line; hence you have to log into each TaskTracker and look in the configured path generally /var/log/hadoop/userlogs/attempt_<job_id>/syslogspecified in ${hadoop.log.dir}/userlogscontains log4j output.

如果您使用的是mapreduce v1,则命令行中没有单点访问;因此,您必须登录到每个 TaskTracker 并查看包含 log4j 输出中通常/var/log/hadoop/userlogs/attempt_<job_id>/syslog指定的配置路径${hadoop.log.dir}/userlogs

回答by javadba

To add to @Ashrith 's answer: you can view the individual task tracker logs via the JobTracker GUI. The running task attempts are visible by the JT Gui and you can click on any of the following: stderr, stdout, and system logs. The system logs are where you find your log4j outputs.

添加到@Ashrith 的答案:您可以通过 JobTracker GUI 查看单个任务跟踪器日志。JT Gui 可以看到正在运行的任务尝试,您可以单击以下任何一项:stderr、stdout 和系统日志。系统日志是您找到 log4j 输出的地方。