java Hadoop - 输出键/值分隔符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16614029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Hadoop - output key/value separator
提问by JustTheAverageGirl
I want to change the Output Separator to ; instead of tab. I already tried: Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?but still my Output ist
我想将输出分隔符更改为 ; 而不是选项卡。我已经尝试过: Hadoop:键和值在输出文件中以制表符分隔。如何做到以分号分隔?但仍然是我的输出
key (tab) value
I'm using the Cloudera Demo (CDH 4.1.3). Here is my Code:
我正在使用 Cloudera 演示 (CDH 4.1.3)。这是我的代码:
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Driver <in> <out>");
System.exit(2);
}
conf.set("mapreduce.textoutputformat.separator", ";");
Path in = new Path(otherArgs[0]);
Path out = new Path(otherArgs[1]);
Job job= new Job(getConf());
job.setJobName("MapReduce");
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(Driver.class);
job.waitForCompletion(true) ? 0 : 1;
I want
我想
key;value
as my output.
作为我的输出。
回答by Thomas Jungblut
The property is called mapreduce.output.textoutputformat.separator.
So you are basically missing the outputthere.
该属性称为mapreduce.output.textoutputformat.separator。所以你基本上错过了output那里。
You can see that in the newest trunk source code found in the Apache SVN.
您可以在 Apache SVN 中找到的最新主干源代码中看到这一点。
回答by Nikita Bosik
In 2019, it's getConf().set(TextOutputFormat.SEPARATOR, ";");(thanks @AsheshKumarSingh)
在 2019 年,它是getConf().set(TextOutputFormat.SEPARATOR, ";");(感谢 @AsheshKumarSingh)
Using native constant provides better maintainability and less surprise I believe.
我相信使用原生常量提供了更好的可维护性和更少的惊喜。
Important: this property must be set beforeJob.getInstance(getConf())/ new Job(getConf()), as job copies parameters and doesn't care about further conf modifications.
重要提示:此属性必须在Job.getInstance(getConf())/之前设置new Job(getConf()),因为作业会复制参数并且不关心进一步的 conf 修改。
回答by Unmesha SreeVeni
You should conf.set("mapreduce.textoutputformat.separator", ";");
你应该 conf.set("mapreduce.textoutputformat.separator", ";");
Use of conf.set("mapreduce.textoutputformat.separator", ";");is deprecated
使用conf.set("mapreduce.textoutputformat.separator", ";");已过时
mapredand mapreduce
mapred和 mapreduce
Full code:This is working.
完整代码:这是有效的。
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Driver <in> <out>");
System.exit(2);
}
conf.set("mapred.textoutputformat.separator", ";");
Path in = new Path(otherArgs[0]);
Path out = new Path(otherArgs[1]);
Job job= new Job(getConf());
job.setJobName("MapReduce");
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(Driver.class);
job.waitForCompletion(true) ? 0 : 1;

