java Hadoop Word count：接收以字母“c”开头的单词总数

Question

提问by King11

Heres the Hadoop word count java map and reduce source code:

下面是 Hadoop 字数统计 java map 和 reduce 源代码：

In the map function, I've gotten to where I can output all the word that starts with the letter "c" and also the total number of times that word appears, but what I'm trying to do is just output the total number of words starting with the letter "c" but I'm stuck a little on getting the total number.Any help would be greatly appreciated, Thank you.

在 map 函数中，我已经到了可以输出所有以字母“c”开头的单词以及该单词出现的总次数的地方，但我想要做的只是输出总数以字母“c”开头的单词，但我在获取总数方面有点卡住了。任何帮助将不胜感激，谢谢。

Example

例子

My Output of what I'm getting:

我得到的输出：

could 2

可以 2

can 3

可以 3

cat 5

猫 5

What I'm trying to get:

我想得到什么：

c-total 10

c-总计 10

public static class MapClass extends MapReduceBase
   implements Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value,
                OutputCollector<Text, IntWritable> output,
                Reporter reporter) throws IOException {
  String line = value.toString();
  StringTokenizer itr = new StringTokenizer(line);
  while (itr.hasMoreTokens()) {
    word.set(itr.nextToken());
    if(word.toString().startsWith("c"){
    output.collect(word, one);
   }
  }
 } 
}


public static class Reduce extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,
                   OutputCollector<Text, IntWritable> output,
                   Reporter reporter) throws IOException {
  int sum = 0;
  while (values.hasNext()) {
    sum += values.next().get(); //gets the sum of the words and add them together
  }
  output.collect(key, new IntWritable(sum)); //outputs the word and the number
  }
 }

Answer 1

回答by Chris Gerken

Instead of

代替

output.collect(word, one);

in your mapper, try:

在您的映射器中，尝试：

output.collect("c-total", one);

Answer 2

回答by Unmesha SreeVeni

Chris Gerken's answer is right.

克里斯·格肯的回答是正确的。

If you are outputing word as your key it will only help you to calculate the count of unique words starting with "c"

如果您输出单词作为键，它只会帮助您计算以“c”开头的唯一单词的数量

Not all total count of "c".

并非所有“c”的总数。

So for that you need to output a unique key from mapper.

因此，为此您需要从映射器输出一个唯一键。

 while (itr.hasMoreTokens()) {
            String token = itr.nextToken();
            if(token.startsWith("c")){
                word.set("C_Count");
                output.collect(word, one);
            }

        }

Here is an example using New Api

这是使用 New Api 的示例

Driver class

驱动程序类

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = new Job(conf, "wordcount");
        FileSystem fs = FileSystem.get(conf);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        if (fs.exists(new Path(args[1])))
            fs.delete(new Path(args[1]), true);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setJarByClass(WordCount.class);     
        job.waitForCompletion(true);
    }

}

Mapper class

映射器类

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer itr = new StringTokenizer(line);
        while (itr.hasMoreTokens()) {
            String token = itr.nextToken();
            if(token.startsWith("c")){
                word.set("C_Count");
                context.write(word, one);
            }

        }
    }
}

Reducer class

减速机类

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
}

Answer 3

回答by Dravidian

Simpler code for mapper:

更简单的映射器代码：

public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> op, Reporter r)throws IOException
{
    String s = value.toString();
      for (String w : s.split("\W+"))
       {
       if (w.length()>0)
        {
         if(w.startsWith("C")){
         op.collect(new Text("C-Count"), new IntWritable(1));        
         }
       }
  }
}

java Hadoop Word count：接收以字母“c”开头的单词总数

提问by King11

回答by Chris Gerken

回答by Unmesha SreeVeni

回答by Dravidian

相关推荐

最近更新

标签

java Hadoop Word count：接收以字母“c”开头的单词总数

提问by King11

回答by Chris Gerken

回答by Unmesha SreeVeni

回答by Dravidian

相关推荐

java Mockito 模拟所有方法调用和返回

java 不支持的major.minor 版本52.0 - 同时使用较低版本进行编译。怎么还是不好？

java cvc-complex-type.3.2.2：属性“key”不允许出现在元素“props”中

java 如何从 EC 公钥字节获取 PublicKey 对象？

相关推荐

最近更新

标签