Java 不使用 JobConf 运行 Hadoop 作业
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2115292/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Run Hadoop job without using JobConf
提问by Greg Cottman
I can't find a single example of submitting a Hadoop job that does not use the deprecated JobConf
class. JobClient
, which hasn't been deprecated, still only supports methods that take a JobConf
parameter.
我找不到提交不使用弃用JobConf
类的 Hadoop 作业的单个示例。 JobClient
,尚未弃用,仍仅支持带JobConf
参数的方法。
Can someone please point me at an example of Java code submitting a Hadoop map/reduce job using only the Configuration
class (not JobConf
), and using the mapreduce.lib.input
package instead of mapred.input
?
有人可以指点我仅使用Configuration
类(不是JobConf
)并使用mapreduce.lib.input
包而不是使用包来提交 Hadoop 映射/减少作业的 Java 代码示例mapred.input
吗?
采纳答案by zjffdu
Hope this helpful
希望这有帮助
import java.io.File;
import org.apache.commons.io.FileUtils;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MapReduceExample extends Configured implements Tool {
static class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
public MyMapper(){
}
protected void map(
LongWritable key,
Text value,
org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, LongWritable, Text>.Context context)
throws java.io.IOException, InterruptedException {
context.getCounter("mygroup", "jeff").increment(1);
context.write(key, value);
};
}
@Override
public int run(String[] args) throws Exception {
Job job = new Job();
job.setMapperClass(MyMapper.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception {
FileUtils.deleteDirectory(new File("data/output"));
args = new String[] { "data/input", "data/output" };
ToolRunner.run(new MapReduceExample(), args);
}
}
回答by Binary Nerd
I believe this tutorialillustrates removing the deprecated JobConf class using Hadoop 0.20.1.
我相信本教程说明了使用 Hadoop 0.20.1 删除已弃用的 JobConf 类。
回答by dk.
This is a nice example with downloadable code: http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.htmlIt's also over two years old and there is no official documentation discussing the new API. Sad.
这是一个带有可下载代码的很好的示例:http: //sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html它也已经两年多了,并且没有讨论新 API 的官方文档。伤心。
回答by Yatin
In the previous API there were three ways of submitting the job and one of them is by submitting the job and getting a reference to the RunningJob and getting an id of the RunningJob.
在之前的 API 中,有三种提交作业的方式,其中一种是提交作业并获取对 RunningJob 的引用并获取 RunningJob 的 id。
submitJob(JobConf) : only submits the job, then poll the returned handle to the RunningJob to query status and make scheduling decisions.
How can one use the new Api and get a reference to the RunningJob and get an id of the runningJob as none of the api's return a reference to RunningJob
如何使用新的 Api 并获得对 RunningJob 的引用并获得 runningJob 的 id,因为没有任何 api 返回对 RunningJob 的引用
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html
thanks
谢谢
回答by coderz
Try to use Configuration
and Job
. Here is an example:
尝试使用Configuration
和Job
。下面是一个例子:
(Replace your Mapper
, Combiner
, Reducer
classes and other configuration)
(替换你的Mapper
, Combiner
,Reducer
类和其他配置)
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
if(args.length != 2) {
System.err.println("Usage: <in> <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "Word Count");
// set jar
job.setJarByClass(WordCount.class);
// set Mapper, Combiner, Reducer
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
/* Optional, set customer defined Partioner:
* job.setPartitionerClass(MyPartioner.class);
*/
// set output key
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// set input and output path
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// by default, Hadoop use TextInputFormat and TextOutputFormat
// any customer defined input and output class must implement InputFormat/OutputFormat interface
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}