java hadoop map reduce job with HDFS input and HBASE output

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4545579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 06:53:23  来源:igfitidea点击:

hadoop map reduce job with HDFS input and HBASE output

javahadoopmapreducehbasehdfs

提问by jmventar

I'm new on hadoop. I have a MapReduce job which is supposed to get an input from Hdfs and write the output of the reducer to Hbase. I haven't found any good example.

我是 hadoop 的新手。我有一个 MapReduce 作业,它应该从 Hdfs 获取输入并将减速器的输出写入 Hbase。我还没有找到任何好的例子。

Here's the code, the error runing this example is Type mismatch in map, expected ImmutableBytesWritable recieved IntWritable.

这是代码,运行此示例的错误是映射中的类型不匹配,预期 ImmutableBytesWritable 收到 IntWritable。

Mapper Class

映射器类

public static class AddValueMapper extends Mapper < LongWritable,
 Text, ImmutableBytesWritable, IntWritable > {  

  /* input <key, line number : value, full line>
   *  output <key, log key : value >*/  
public void map(LongWritable key, Text value, 
     Context context)throws IOException, 
     InterruptedException {
  byte[] key;
  int value, pos = 0;
  String line = value.toString();
  String p1 , p2 = null;
  pos = line.indexOf("=");

   //Key part
   p1 = line.substring(0, pos);
   p1 = p1.trim();
   key = Bytes.toBytes(p1);   

   //Value part
   p2 = line.substring(pos +1);
   p2 = p2.trim();
   value = Integer.parseInt(p2);

   context.write(new ImmutableBytesWritable(key),new IntWritable(value));
  }
}

Reducer Class

减速机类

public static class AddValuesReducer extends TableReducer<
  ImmutableBytesWritable, IntWritable, ImmutableBytesWritable> {

  public void reduce(ImmutableBytesWritable key, Iterable<IntWritable> values, 
   Context context) throws IOException, InterruptedException {

         long total =0;
         // Loop values
         while(values.iterator().hasNext()){
           total += values.iterator().next().get();
         }
         // Put to HBase
         Put put = new Put(key.get());
         put.add(Bytes.toBytes("data"), Bytes.toBytes("total"),
           Bytes.toBytes(total));
         Bytes.toInt(key.get()), total));
            context.write(key, put);
        }
    }

I had a similar job only with HDFS and works fine.

我只在 HDFS 上有过类似的工作,并且工作正常。

Edited 18-06-2013. The college project finished successfully two years ago. For job configuration (driver part) check correct answer.

2013 年 6 月 18 日编辑。两年前大学项目顺利完成。对于作业配置(驱动程序部分),请检查正确答案。

采纳答案by saurabh shashank

Here is the code which will solve your problem

这是可以解决您的问题的代码





Driver

司机

HBaseConfiguration conf =  HBaseConfiguration.create();
Job job = new Job(conf,"JOB_NAME");
    job.setJarByClass(yourclass.class);
    job.setMapperClass(yourMapper.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Intwritable.class);
    FileInputFormat.setInputPaths(job, new Path(inputPath));
    TableMapReduceUtil.initTableReducerJob(TABLE,
            yourReducer.class, job);
    job.setReducerClass(yourReducer.class);
            job.waitForCompletion(true);




Mapper&Reducer

Mapper&Reducer

class yourMapper extends Mapper<LongWritable, Text, Text,IntWritable> {
//@overide map()
 }


class yourReducer
        extends
        TableReducer<Text, IntWritable, 
        ImmutableBytesWritable>
{
//@override reduce()
}


回答by Prasad D

The best and fastestway to BulkLoad data in HBase is use of HFileOutputFormatand CompliteBulkLoadutility.

在 HBase 中批量加载数据的最好和最快的方法是使用HFileOutputFormatCompliteBulkLoad实用程序。

You will find a sample code here:

您将在此处找到示例代码:

Hope this will be useful :)

希望这将是有用的:)

回答by David

Not sure why the HDFS version works: normaly you have to set the input format for the job, and FileInputFormat is an abstract class. Perhaps you left some lines out? such as

不知道为什么 HDFS 版本有效:通常您必须为作业设置输入格式,而 FileInputFormat 是一个抽象类。也许你遗漏了一些台词?如

job.setInputFormatClass(TextInputFormat.class);

回答by badri

 public void map(LongWritable key, Text value, 
 Context context)throws IOException, 
 InterruptedException {

change this to immutableBytesWritable, intwritable.

将其更改为immutableBytesWritable, intwritable

I am not sure..hope it works

我不确定..希望它有效