Java 在 Hadoop 中使用 NullWritable 的优势

Question

提问by Venk K

What are the advantages of using NullWritablefor nullkeys/values over using nulltexts (i.e. new Text(null)). I see the following from the ?Hadoop: The Definitive Guide? book.

什么是使用的优点，NullWritable对null使用过的键/值null文本（即new Text(null)）。我从 ?Hadoop: The Definitive Guide 中看到以下内容？书。

NullWritableis a special type of Writable, as it has a zero-length serialization. No bytes are written to, or read from, the stream. It is used as a placeholder; for example, in MapReduce, a key or a value can be declared as a NullWritablewhen you don't need to use that position—it effectively stores a constant empty value. NullWritable can also be useful as a key in SequenceFilewhen you want to store a list of values, as opposed to key-value pairs. It is an immutable singleton: the instance can be retrieved by calling NullWritable.get()

NullWritable是的特殊类型Writable，因为它具有零长度序列化。没有字节写入或读取流。它用作占位符；例如，在 MapReduce 中，NullWritable当您不需要使用该位置时，可以将键或值声明为 a——它有效地存储了一个常量空值。SequenceFile与键值对相反，当您想要存储值列表时，NullWritable 也可以用作键。它是一个不可变的单例：可以通过调用来检索实例 NullWritable.get()

I do not clearly understand how the output is written out using NullWritable? Will there be a single constant value in the beginning output file indicating that the keys or values of this file are null, so that the MapReduce framework can ignore reading the nullkeys/values (whichever is null)? Also, how actually are nulltexts serialized?

我不太清楚输出是如何使用NullWritable? 在开始的输出文件中是否会有一个常量值指示该文件的键或值是null，以便 MapReduce 框架可以忽略读取null键/值（以为准null）？另外，null文本实际上是如何序列化的？

Thanks,

谢谢，

Venkat

文卡特

Answer 1

回答by Joe K

The key/value types must be given at runtime, so anything writing or reading NullWritableswill know ahead of time that it will be dealing with that type; there is no marker or anything in the file. And technically the NullWritablesare "read", it's just that "reading" a NullWritableis actually a no-op. You can see for yourself that there's nothing at all written or read:

键/值类型必须在运行时给出，因此任何写入或读取的内容NullWritables都会提前知道它将处理该类型；文件中没有标记或任何东西。从技术上讲，NullWritables是“阅读”，只是“阅读” aNullWritable实际上是空操作。你可以亲眼看到什么都没有写或读过：

NullWritable nw = NullWritable.get();
ByteArrayOutputStream out = new ByteArrayOutputStream();
nw.write(new DataOutputStream(out));
System.out.println(Arrays.toString(out.toByteArray())); // prints "[]"

ByteArrayInputStream in = new ByteArrayInputStream(new byte[0]);
nw.readFields(new DataInputStream(in)); // works just fine

And as for your question about new Text(null), again, you can try it out:

至于你关于的问题new Text(null)，你可以试试看：

Text text = new Text((String)null);
ByteArrayOutputStream out = new ByteArrayOutputStream();
text.write(new DataOutputStream(out)); // throws NullPointerException
System.out.println(Arrays.toString(out.toByteArray()));

Textwill not work at all with a nullString.

Text根本无法使用nullString.

Answer 2

回答by zwj0571

I change the run method. and success

我改变了运行方法。和成功

@Override
public int run(String[] strings) throws Exception {
    Configuration config = HBaseConfiguration.create();  
    //set job name
    Job job = new Job(config, "Import from file ");
    job.setJarByClass(LogRun.class);
    //set map class
    job.setMapperClass(LogMapper.class);

    //set output format and output table name
    //job.setOutputFormatClass(TableOutputFormat.class);
    //job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "crm_data");
    //job.setOutputKeyClass(ImmutableBytesWritable.class);
    //job.setOutputValueClass(Put.class);

    TableMapReduceUtil.initTableReducerJob("crm_data", null, job);
    job.setNumReduceTasks(0);
    TableMapReduceUtil.addDependencyJars(job);

    FileInputFormat.addInputPath(job, new Path(strings[0]));

    int ret = job.waitForCompletion(true) ? 0 : 1;
    return ret;
}

Answer 3

回答by Arthur B

You can always wrap your string in your own Writable class and have a boolean indicating it has blank strings or not:

你总是可以将你的字符串包装在你自己的 Writable 类中，并有一个布尔值指示它是否有空字符串：

@Override
public void readFields(DataInput in) throws IOException { 
    ...
    boolean hasWord = in.readBoolean();
    if( hasWord ) {
        word = in.readUTF();
    }
    ...
}

and

和

@Override
public void write(DataOutput out) throws IOException {
    ...
    boolean hasWord = StringUtils.isNotBlank(word);
    out.writeBoolean(hasWord);
    if(hasWord) {
        out.writeUTF(word);
    }
    ...
}

Java 在 Hadoop 中使用 NullWritable 的优势

提问by Venk K

回答by Joe K

回答by zwj0571

回答by Arthur B

相关推荐

最近更新

标签

Java 在 Hadoop 中使用 NullWritable 的优势

提问by Venk K

回答by Joe K

回答by zwj0571

回答by Arthur B

相关推荐

如何在类路径中运行带有 jar 的 java 类？

Java 如何在双打中将小数设置为仅 2 位？

如何创建 AppleScript 或 Command 文件以在 Mac OS 上启动 Java 应用程序？

Java 如何查看/更改 MySQL 连接超时设置？

相关推荐

最近更新

标签