java Hadoop - 输出键/值分隔符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16614029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Hadoop - output key/value separator
提问by JustTheAverageGirl
I want to change the Output Separator to ; instead of tab. I already tried: Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?but still my Output ist
我想将输出分隔符更改为 ; 而不是选项卡。我已经尝试过: Hadoop:键和值在输出文件中以制表符分隔。如何做到以分号分隔?但仍然是我的输出
key (tab) value
I'm using the Cloudera Demo (CDH 4.1.3). Here is my Code:
我正在使用 Cloudera 演示 (CDH 4.1.3)。这是我的代码:
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Driver <in> <out>");
System.exit(2);
}
conf.set("mapreduce.textoutputformat.separator", ";");
Path in = new Path(otherArgs[0]);
Path out = new Path(otherArgs[1]);
Job job= new Job(getConf());
job.setJobName("MapReduce");
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(Driver.class);
job.waitForCompletion(true) ? 0 : 1;
I want
我想
key;value
as my output.
作为我的输出。
回答by Thomas Jungblut
The property is called mapreduce.output.textoutputformat.separator
.
So you are basically missing the output
there.
该属性称为mapreduce.output.textoutputformat.separator
。所以你基本上错过了output
那里。
You can see that in the newest trunk source code found in the Apache SVN.
您可以在 Apache SVN 中找到的最新主干源代码中看到这一点。
回答by Nikita Bosik
In 2019, it's getConf().set(TextOutputFormat.SEPARATOR, ";");
(thanks @AsheshKumarSingh)
在 2019 年,它是getConf().set(TextOutputFormat.SEPARATOR, ";");
(感谢 @AsheshKumarSingh)
Using native constant provides better maintainability and less surprise I believe.
我相信使用原生常量提供了更好的可维护性和更少的惊喜。
Important: this property must be set beforeJob.getInstance(getConf())
/ new Job(getConf())
, as job copies parameters and doesn't care about further conf modifications.
重要提示:此属性必须在Job.getInstance(getConf())
/之前设置new Job(getConf())
,因为作业会复制参数并且不关心进一步的 conf 修改。
回答by Unmesha SreeVeni
You should conf.set("mapreduce.textoutputformat.separator", ";");
你应该 conf.set("mapreduce.textoutputformat.separator", ";");
Use of conf.set("mapreduce.textoutputformat.separator", ";");
is deprecated
使用conf.set("mapreduce.textoutputformat.separator", ";");
已过时
mapred
and mapreduce
mapred
和 mapreduce
Full code:This is working.
完整代码:这是有效的。
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Driver <in> <out>");
System.exit(2);
}
conf.set("mapred.textoutputformat.separator", ";");
Path in = new Path(otherArgs[0]);
Path out = new Path(otherArgs[1]);
Job job= new Job(getConf());
job.setJobName("MapReduce");
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
job.setJarByClass(Driver.class);
job.waitForCompletion(true) ? 0 : 1;