java Hadoop - 输出键/值分隔符

Question

提问by JustTheAverageGirl

I want to change the Output Separator to ; instead of tab. I already tried: Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?but still my Output ist

我想将输出分隔符更改为 ; 而不是选项卡。我已经尝试过： Hadoop：键和值在输出文件中以制表符分隔。如何做到以分号分隔？但仍然是我的输出

key (tab) value

I'm using the Cloudera Demo (CDH 4.1.3). Here is my Code:

我正在使用 Cloudera 演示 (CDH 4.1.3)。这是我的代码：

Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: Driver <in> <out>");
            System.exit(2);
        }
        conf.set("mapreduce.textoutputformat.separator", ";");

        Path in = new Path(otherArgs[0]);
        Path out = new Path(otherArgs[1]);

        Job job= new Job(getConf());
        job.setJobName("MapReduce");

        job.setMapperClass(Mapper.class);
        job.setReducerClass(Reducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, in);
        FileOutputFormat.setOutputPath(job, out);

        job.setJarByClass(Driver.class);
        job.waitForCompletion(true) ? 0 : 1;

I want

我想

key;value

as my output.

作为我的输出。

Answer 1

回答by Thomas Jungblut

The property is called mapreduce.output.textoutputformat.separator. So you are basically missing the outputthere.

该属性称为mapreduce.output.textoutputformat.separator。所以你基本上错过了output那里。

You can see that in the newest trunk source code found in the Apache SVN.

您可以在 Apache SVN 中找到的最新主干源代码中看到这一点。

Answer 2

回答by Nikita Bosik

In 2019, it's getConf().set(TextOutputFormat.SEPARATOR, ";");(thanks @AsheshKumarSingh)

在 2019 年，它是getConf().set(TextOutputFormat.SEPARATOR, ";");（感谢 @AsheshKumarSingh）

Using native constant provides better maintainability and less surprise I believe.

我相信使用原生常量提供了更好的可维护性和更少的惊喜。

Important: this property must be set beforeJob.getInstance(getConf())/ new Job(getConf()), as job copies parameters and doesn't care about further conf modifications.

重要提示：此属性必须在Job.getInstance(getConf())/之前设置new Job(getConf())，因为作业会复制参数并且不关心进一步的 conf 修改。

Answer 3

回答by Unmesha SreeVeni

You should conf.set("mapreduce.textoutputformat.separator", ";");

你应该 conf.set("mapreduce.textoutputformat.separator", ";");

Use of conf.set("mapreduce.textoutputformat.separator", ";");is deprecated

使用conf.set("mapreduce.textoutputformat.separator", ";");已过时

mapredand mapreduce

mapred和 mapreduce

Link

关联

Full code:This is working.

完整代码：这是有效的。

    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
        System.err.println("Usage: Driver <in> <out>");
        System.exit(2);
    }
    conf.set("mapred.textoutputformat.separator", ";");

    Path in = new Path(otherArgs[0]);
    Path out = new Path(otherArgs[1]);

    Job job= new Job(getConf());
    job.setJobName("MapReduce");

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.setInputPaths(job, in);
    FileOutputFormat.setOutputPath(job, out);

    job.setJarByClass(Driver.class);
    job.waitForCompletion(true) ? 0 : 1;

java Hadoop - 输出键/值分隔符

提问by JustTheAverageGirl

回答by Thomas Jungblut

回答by Nikita Bosik

回答by Unmesha SreeVeni

相关推荐

最近更新

标签

java Hadoop - 输出键/值分隔符

提问by JustTheAverageGirl

回答by Thomas Jungblut

回答by Nikita Bosik

回答by Unmesha SreeVeni

相关推荐

java Ehcache-spring-annotations @Cacheable 不捕获以字符串对象为参数的方法

java 在 WebApp 中创建和下载大型 ZIP（来自多个 BLOB）的最佳实践

java 如何找到天气预报网站的 API？

java 使用拖放重新排序 JList

相关推荐

最近更新

标签