java 了解 LongWritable
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11086263/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Understanding LongWritable
提问by Mijatovic
I'm sorry if this is a foolish question, but I couldn't find answer with a Google search.
How I can understand LongWritable
type? What is it? Can anybody link to a schema or other helpful page.
如果这是一个愚蠢的问题,我很抱歉,但我无法通过 Google 搜索找到答案。我如何理解LongWritable
类型?它是什么?任何人都可以链接到架构或其他有用的页面。
回答by Gareth Davis
Hadoop needs to be able to serialise data in and out of Java types via DataInput
and DataOutput
objects (IO Streams usually). The Writable classes do this by implementing two methods `write(DataOuput) and readFields(DataInput).
Hadoop 需要能够通过DataInput
和DataOutput
对象(通常是 IO 流)将数据序列化进出 Java 类型。Writable 类通过实现两个方法 write(DataOuput) 和 readFields(DataInput) 来做到这一点。
Specifically LongWritable
is a Writable
class that wraps a java long.
具体LongWritable
是一个Writable
包装一个java long的类。
Most of the time (especially just starting out) you can mentally replace LongWritable
-> Long
i.e. it's just a number. If you get to defining your own datatypes you will start to become every familiar with implementing the writable interface:
大多数时候(尤其是刚开始),您可以在心理上替换LongWritable
->Long
即它只是一个数字。如果您开始定义自己的数据类型,您将开始熟悉实现可写接口:
Which looks some thing like:
看起来像这样:
public interface Writable {
public void write(DataOutput out) throws IOException;
public void readFields(DataInput in) throws IOException;
}
回答by Ahmed Ahmed
The Mapper class is a generic type, with four formal type parameters that specify the input key, input value, output key, and output value types of the map function.
Mapper 类是一个泛型类型,有四个形式类型参数,分别指定了map 函数的输入键、输入值、输出键和输出值类型。
public class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
}
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException
{
}
}
For the code example, the input key is a long integer offset, the input value is a line of text. the output key is integer, and the output value is an integer. Rather than use built-in Java types, Hadoop provides its own set of basic types that are optimized for network serialization. These are found in the org.apache.hadoop.io package.
对于代码示例,输入键是一个长整数偏移量,输入值是一行文本。输出键为整数,输出值为整数。Hadoop 没有使用内置的 Java 类型,而是提供了自己的一组基本类型,这些类型针对网络序列化进行了优化。这些可以在 org.apache.hadoop.io 包中找到。
Here we use LongWritable, which corresponds to a Java Long, Text (like Java String),and IntWritable (like Java Integer).
这里我们使用 LongWritable,它对应于 Java Long、Text(如 Java String)和 IntWritable(如 Java Integer)。
回答by Ravindra babu
From Apache documentation page,
从 Apache 文档页面,
Writable
is described as :
Writable
被描述为:
serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.
基于 DataInput 和 DataOutput 实现简单、高效的序列化协议的可序列化对象。
LongWritable
is A WritableComparable for longs.
LongWritable
对于 long 是 WritableComparable。
Need for Writables:
需要可写:
In Hadoop, interprocess communication was built with remote procedure calls ( RPC). The RPC protocol uses serialization to render the message into a binary stream at sender and it will be deserialized into the original message from binary stream at receiver.
在 Hadoop 中,进程间通信是通过远程过程调用 (RPC) 构建的。RPC 协议使用序列化将消息在发送方呈现为二进制流,然后在接收方将其反序列化为二进制流中的原始消息。
Java Serialization has many disadvantages with respect to performance and efficiency. Java serialization is much slower than using in memory stores and tends to significantly expand the size of the object. Java Serialization also creates a lot of garbage.
Java 序列化在性能和效率方面有许多缺点。Java 序列化比在内存存储中使用要慢得多,并且往往会显着扩大对象的大小。Java 序列化还会产生大量垃圾。
Refer to these two posts:
参考这两个帖子:
dzonearticle
dzone文章
For effectiveness of Hadoop, the serialization/de-serialization process should be optimized because huge number of remote calls happen between the nodes in the cluster. So the serialization format should be fast, compact, extensible and interoperable
. Due to this reason, Hadoop framework has come up with own IO classes to replace java primitive data types. e.g. IntWritbale
for int
, LongWritable
for long
, Text
for String
etc.
为了Hadoop的有效性,应该优化序列化/反序列化过程,因为集群中的节点之间发生了大量的远程调用。So the serialization format should be fast, compact, extensible and interoperable
. 由于这个原因,Hadoop 框架提出了自己的 IO 类来替代 java 原始数据类型。例如IntWritbale
for int
, LongWritable
for long
, Text
forString
等。
You can get more details if you refer to "Hadoop the definitive guide" fourth edition.
如果您参考“Hadoop 权威指南”第四版,您可以获得更多详细信息。