java job.setOutputKeyClass 和 job.setOutputReduceClass 指的是哪里?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14225205/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 15:30:54  来源:igfitidea点击:

Where does job.setOutputKeyClass and job.setOutputReduceClass refers to?

javahadoopmapreduce

提问by nik686

I thought that they refer to the Reducer but in my program I have

我以为他们指的是 Reducer 但在我的程序中我有

public static class MyMapper extends Mapper< LongWritable, Text, Text, Text >

public static class MyMapper extends Mapper< LongWritable, Text, Text, Text >

and

public static class MyReducer extends Reducer< Text, Text, NullWritable, Text >

public static class MyReducer extends Reducer< Text, Text, NullWritable, Text >

so if I have

所以如果我有

job.setOutputKeyClass( NullWritable.class );

job.setOutputKeyClass( NullWritable.class );

job.setOutputValueClass( Text.class );

job.setOutputValueClass( Text.class );

I get the following Exception

我得到以下异常

Type mismatch in key from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.Text

Type mismatch in key from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.Text

but if I have

但如果我有

job.setOutputKeyClass( Text.class );

job.setOutputKeyClass( Text.class );

there is no problem.

没有问题。

Is there sth wrong with my code or this happens because of NullWritable or sth else?

我的代码有什么问题还是因为 NullWritable 或其他原因而发生这种情况?

Also do I have to use job.setInputFormatClassand job.setOutputFormatClass? Because my programs runs correctly without them.

我还必须使用job.setInputFormatClassjob.setOutputFormatClass吗?因为我的程序没有它们也能正常运行。

回答by Charles Menguy

Calling job.setOutputKeyClass( NullWritable.class );will set the types expected as output from both the map and reduce phases.

调用job.setOutputKeyClass( NullWritable.class );将设置预期作为 map 和 reduce 阶段输出的类型。

If your Mapper emits different types than the Reducer, you can set the types emitted by the mapper with the JobConf's setMapOutputKeyClass()and setMapOutputValueClass()methods. These implicitly set the input types expected by the Reducer.

如果您的 Mapper 发出与 Reducer 不同的类型,您可以使用JobConf'ssetMapOutputKeyClass()setMapOutputValueClass()方法设置映射器发出的类型。这些隐式设置了 Reducer 期望的输入类型。

(source: Yahoo Developer Tutorial)

(来源:雅虎开发者教程

Regarding your second question, the default InputFormatis the TextInputFormat. This treats each line of each input file as a separate record, and performs no parsing. You can call these methods if you need to process your input in a different format, here are some examples:

关于你的第二个问题,默认InputFormatTextInputFormat. 这将每个输入文件的每一行都视为一个单独的记录,并且不执行解析。如果需要以不同的格式处理输入,可以调用这些方法,以下是一些示例:

InputFormat             | Description                                      | Key                                      | Value
--------------------------------------------------------------------------------------------------------------------------------------------------------
TextInputFormat         | Default format; reads lines of text files        | The byte offset of the line              | The line contents
KeyValueInputFormat     | Parses lines into key, val pairs                 | Everything up to the first tab character | The remainder of the line
SequenceFileInputFormat | A Hadoop-specific high-performance binary format | user-defined                             | user-defined

The default instance of OutputFormatis TextOutputFormat, which writes (key, value) pairs on individual lines of a text file. Some examples below:

的默认实例OutputFormatTextOutputFormat,它在文本文件的各个行上写入 (key, value) 对。下面的一些例子:

OutputFormat             | Description
---------------------------------------------------------------------------------------------------------
TextOutputFormat         | Default; writes lines in "key \t value" form
SequenceFileOutputFormat | Writes binary files suitable for reading into subsequent MapReduce jobs
NullOutputFormat         | Disregards its inputs

(source: Other Yahoo Developer Tutorial)

(来源:其他雅虎开发者教程