Java 在 Hadoop 中使用 NullWritable 的优势
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16198752/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Advantages of using NullWritable in Hadoop
提问by Venk K
What are the advantages of using NullWritable
for null
keys/values over using null
texts (i.e. new Text(null)
). I see the following from the ?Hadoop: The Definitive Guide? book.
什么是使用的优点,NullWritable
对null
使用过的键/值null
文本(即new Text(null)
)。我从 ?Hadoop: The Definitive Guide 中看到以下内容?书。
NullWritable
is a special type ofWritable
, as it has a zero-length serialization. No bytes are written to, or read from, the stream. It is used as a placeholder; for example, in MapReduce, a key or a value can be declared as aNullWritable
when you don't need to use that position—it effectively stores a constant empty value. NullWritable can also be useful as a key inSequenceFile
when you want to store a list of values, as opposed to key-value pairs. It is an immutable singleton: the instance can be retrieved by callingNullWritable.get()
NullWritable
是 的特殊类型Writable
,因为它具有零长度序列化。没有字节写入或读取流。它用作占位符;例如,在 MapReduce 中,NullWritable
当您不需要使用该位置时,可以将键或值声明为 a——它有效地存储了一个常量空值。SequenceFile
与键值对相反,当您想要存储值列表时,NullWritable 也可以用作键。它是一个不可变的单例:可以通过调用来检索实例NullWritable.get()
I do not clearly understand how the output is written out using NullWritable
? Will there be a single constant value in the beginning output file indicating that the keys or values of this file are null
, so that the MapReduce framework can ignore reading the null
keys/values (whichever is null
)? Also, how actually are null
texts serialized?
我不太清楚输出是如何使用NullWritable
? 在开始的输出文件中是否会有一个常量值指示该文件的键或值是null
,以便 MapReduce 框架可以忽略读取null
键/值(以 为准null
)?另外,null
文本实际上是如何序列化的?
Thanks,
谢谢,
Venkat
文卡特
回答by Joe K
The key/value types must be given at runtime, so anything writing or reading NullWritables
will know ahead of time that it will be dealing with that type; there is no marker or anything in the file. And technically the NullWritables
are "read", it's just that "reading" a NullWritable
is actually a no-op. You can see for yourself that there's nothing at all written or read:
键/值类型必须在运行时给出,因此任何写入或读取的内容NullWritables
都会提前知道它将处理该类型;文件中没有标记或任何东西。从技术上讲,NullWritables
是“阅读”,只是“阅读” aNullWritable
实际上是空操作。你可以亲眼看到什么都没有写或读过:
NullWritable nw = NullWritable.get();
ByteArrayOutputStream out = new ByteArrayOutputStream();
nw.write(new DataOutputStream(out));
System.out.println(Arrays.toString(out.toByteArray())); // prints "[]"
ByteArrayInputStream in = new ByteArrayInputStream(new byte[0]);
nw.readFields(new DataInputStream(in)); // works just fine
And as for your question about new Text(null)
, again, you can try it out:
至于你关于 的问题new Text(null)
,你可以试试看:
Text text = new Text((String)null);
ByteArrayOutputStream out = new ByteArrayOutputStream();
text.write(new DataOutputStream(out)); // throws NullPointerException
System.out.println(Arrays.toString(out.toByteArray()));
Text
will not work at all with a null
String
.
Text
根本无法使用null
String
.
回答by zwj0571
I change the run method. and success
我改变了运行方法。和成功
@Override
public int run(String[] strings) throws Exception {
Configuration config = HBaseConfiguration.create();
//set job name
Job job = new Job(config, "Import from file ");
job.setJarByClass(LogRun.class);
//set map class
job.setMapperClass(LogMapper.class);
//set output format and output table name
//job.setOutputFormatClass(TableOutputFormat.class);
//job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "crm_data");
//job.setOutputKeyClass(ImmutableBytesWritable.class);
//job.setOutputValueClass(Put.class);
TableMapReduceUtil.initTableReducerJob("crm_data", null, job);
job.setNumReduceTasks(0);
TableMapReduceUtil.addDependencyJars(job);
FileInputFormat.addInputPath(job, new Path(strings[0]));
int ret = job.waitForCompletion(true) ? 0 : 1;
return ret;
}
回答by Arthur B
You can always wrap your string in your own Writable class and have a boolean indicating it has blank strings or not:
你总是可以将你的字符串包装在你自己的 Writable 类中,并有一个布尔值指示它是否有空字符串:
@Override
public void readFields(DataInput in) throws IOException {
...
boolean hasWord = in.readBoolean();
if( hasWord ) {
word = in.readUTF();
}
...
}
and
和
@Override
public void write(DataOutput out) throws IOException {
...
boolean hasWord = StringUtils.isNotBlank(word);
out.writeBoolean(hasWord);
if(hasWord) {
out.writeUTF(word);
}
...
}