Java 不使用 JobConf 运行 Hadoop 作业

Question

提问by Greg Cottman

I can't find a single example of submitting a Hadoop job that does not use the deprecated JobConfclass. JobClient, which hasn't been deprecated, still only supports methods that take a JobConfparameter.

我找不到提交不使用弃用JobConf类的 Hadoop 作业的单个示例。 JobClient，尚未弃用，仍仅支持带JobConf参数的方法。

Can someone please point me at an example of Java code submitting a Hadoop map/reduce job using only the Configurationclass (not JobConf), and using the mapreduce.lib.inputpackage instead of mapred.input?

有人可以指点我仅使用Configuration类（不是JobConf）并使用mapreduce.lib.input包而不是使用包来提交 Hadoop 映射/减少作业的 Java 代码示例mapred.input吗？

Answer 1

采纳答案by zjffdu

Hope this helpful

希望这有帮助

import java.io.File;

import org.apache.commons.io.FileUtils;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class MapReduceExample extends Configured implements Tool {

    static class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
        public MyMapper(){

        }

        protected void map(
                LongWritable key,
                Text value,
                org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, LongWritable, Text>.Context context)
                throws java.io.IOException, InterruptedException {
            context.getCounter("mygroup", "jeff").increment(1);
            context.write(key, value);
        };
    }

    @Override
    public int run(String[] args) throws Exception {
        Job job = new Job();
        job.setMapperClass(MyMapper.class);
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);
        return 0;
    }

    public static void main(String[] args) throws Exception {
        FileUtils.deleteDirectory(new File("data/output"));
        args = new String[] { "data/input", "data/output" };
        ToolRunner.run(new MapReduceExample(), args);
    }
}

Answer 2

回答by Binary Nerd

I believe this tutorialillustrates removing the deprecated JobConf class using Hadoop 0.20.1.

我相信本教程说明了使用 Hadoop 0.20.1 删除已弃用的 JobConf 类。

Answer 3

回答by dk.

This is a nice example with downloadable code: http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.htmlIt's also over two years old and there is no official documentation discussing the new API. Sad.

这是一个带有可下载代码的很好的示例：http: //sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html它也已经两年多了，并且没有讨论新 API 的官方文档。伤心。

Answer 4

回答by Yatin

In the previous API there were three ways of submitting the job and one of them is by submitting the job and getting a reference to the RunningJob and getting an id of the RunningJob.

在之前的 API 中，有三种提交作业的方式，其中一种是提交作业并获取对 RunningJob 的引用并获取 RunningJob 的 id。

submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions.

How can one use the new Api and get a reference to the RunningJob and get an id of the runningJob as none of the api's return a reference to RunningJob

如何使用新的 Api 并获得对 RunningJob 的引用并获得 runningJob 的 id，因为没有任何 api 返回对 RunningJob 的引用

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html

thanks

谢谢

Answer 5

回答by coderz

Try to use Configurationand Job. Here is an example:

尝试使用Configuration和Job。下面是一个例子：

(Replace your Mapper, Combiner, Reducerclasses and other configuration)

（替换你的Mapper, Combiner,Reducer类和其他配置）

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {
  public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    Configuration conf = new Configuration();
    if(args.length != 2) {
      System.err.println("Usage: <in> <out>");
      System.exit(2);
    }
    Job job = Job.getInstance(conf, "Word Count");

    // set jar
    job.setJarByClass(WordCount.class);

    // set Mapper, Combiner, Reducer
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);

    /* Optional, set customer defined Partioner:
     * job.setPartitionerClass(MyPartioner.class);
     */

    // set output key
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    // set input and output path
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    // by default, Hadoop use TextInputFormat and TextOutputFormat
    // any customer defined input and output class must implement InputFormat/OutputFormat interface
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Java 不使用 JobConf 运行 Hadoop 作业

提问by Greg Cottman

采纳答案by zjffdu

回答by Binary Nerd

回答by dk.

回答by Yatin

回答by coderz

相关推荐

最近更新

标签

Java 不使用 JobConf 运行 Hadoop 作业

提问by Greg Cottman

采纳答案by zjffdu

回答by Binary Nerd

回答by dk.

回答by Yatin

回答by coderz

相关推荐

Java new Date() 和日历日期的区别

如何在Java中获取给定周数的第一天

Java ActiveMQ 连接超时

Java 计算 bigdecimals 的百分比

相关推荐

最近更新

标签