java 了解 LongWritable

Question

提问by Mijatovic

I'm sorry if this is a foolish question, but I couldn't find answer with a Google search. How I can understand LongWritabletype? What is it? Can anybody link to a schema or other helpful page.

如果这是一个愚蠢的问题，我很抱歉，但我无法通过 Google 搜索找到答案。我如何理解LongWritable类型？它是什么？任何人都可以链接到架构或其他有用的页面。

Answer 1

回答by Gareth Davis

Hadoop needs to be able to serialise data in and out of Java types via DataInputand DataOutputobjects (IO Streams usually). The Writable classes do this by implementing two methods `write(DataOuput) and readFields(DataInput).

Hadoop 需要能够通过DataInput和DataOutput对象（通常是 IO 流）将数据序列化进出 Java 类型。Writable 类通过实现两个方法 write(DataOuput) 和 readFields(DataInput) 来做到这一点。

Specifically LongWritableis a Writableclass that wraps a java long.

具体LongWritable是一个Writable包装一个java long的类。

Most of the time (especially just starting out) you can mentally replace LongWritable-> Longi.e. it's just a number. If you get to defining your own datatypes you will start to become every familiar with implementing the writable interface:

大多数时候（尤其是刚开始），您可以在心理上替换LongWritable->Long即它只是一个数字。如果您开始定义自己的数据类型，您将开始熟悉实现可写接口：

Which looks some thing like:

看起来像这样：

public interface Writable {

       public void write(DataOutput out) throws IOException;

       public void readFields(DataInput in) throws IOException;
}

Answer 2

回答by Ahmed Ahmed

The Mapper class is a generic type, with four formal type parameters that specify the input key, input value, output key, and output value types of the map function.

Mapper 类是一个泛型类型，有四个形式类型参数，分别指定了map 函数的输入键、输入值、输出键和输出值类型。

public class MaxTemperatureMapper
    extends Mapper<LongWritable, Text, Text, IntWritable> {
     @Override
    public void map(LongWritable key, Text value, Context context)
                                throws IOException, InterruptedException {

    }
    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context)
    throws IOException, InterruptedException 
    {
    }

}

For the code example, the input key is a long integer offset, the input value is a line of text. the output key is integer, and the output value is an integer. Rather than use built-in Java types, Hadoop provides its own set of basic types that are optimized for network serialization. These are found in the org.apache.hadoop.io package.

对于代码示例，输入键是一个长整数偏移量，输入值是一行文本。输出键为整数，输出值为整数。Hadoop 没有使用内置的 Java 类型，而是提供了自己的一组基本类型，这些类型针对网络序列化进行了优化。这些可以在 org.apache.hadoop.io 包中找到。

Here we use LongWritable, which corresponds to a Java Long, Text (like Java String),and IntWritable (like Java Integer).

这里我们使用 LongWritable，它对应于 Java Long、Text（如 Java String）和 IntWritable（如 Java Integer）。

Answer 3

回答by Ravindra babu

From Apache documentation page,

从 Apache 文档页面，

Writableis described as :

Writable被描述为：

serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.

基于 DataInput 和 DataOutput 实现简单、高效的序列化协议的可序列化对象。

LongWritableis A WritableComparable for longs.

LongWritable对于 long 是 WritableComparable。

Need for Writables:

需要可写：

In Hadoop, interprocess communication was built with remote procedure calls ( RPC). The RPC protocol uses serialization to render the message into a binary stream at sender and it will be deserialized into the original message from binary stream at receiver.

在 Hadoop 中，进程间通信是通过远程过程调用 (RPC) 构建的。RPC 协议使用序列化将消息在发送方呈现为二进制流，然后在接收方将其反序列化为二进制流中的原始消息。

Java Serialization has many disadvantages with respect to performance and efficiency. Java serialization is much slower than using in memory stores and tends to significantly expand the size of the object. Java Serialization also creates a lot of garbage.

Java 序列化在性能和效率方面有许多缺点。Java 序列化比在内存存储中使用要慢得多，并且往往会显着扩大对象的大小。Java 序列化还会产生大量垃圾。

Refer to these two posts:

参考这两个帖子：

dzonearticle

dzone文章

https://softwareengineering.stackexchange.com/questions/191269/java-serialization-advantages-and-disadvantages-use-or-avoid

For effectiveness of Hadoop, the serialization/de-serialization process should be optimized because huge number of remote calls happen between the nodes in the cluster. So the serialization format should be fast, compact, extensible and interoperable. Due to this reason, Hadoop framework has come up with own IO classes to replace java primitive data types. e.g. IntWritbalefor int, LongWritablefor long, Textfor Stringetc.

为了Hadoop的有效性，应该优化序列化/反序列化过程，因为集群中的节点之间发生了大量的远程调用。So the serialization format should be fast, compact, extensible and interoperable. 由于这个原因，Hadoop 框架提出了自己的 IO 类来替代 java 原始数据类型。例如IntWritbalefor int, LongWritablefor long, TextforString等。

You can get more details if you refer to "Hadoop the definitive guide" fourth edition.

如果您参考“Hadoop 权威指南”第四版，您可以获得更多详细信息。

java 了解 LongWritable

提问by Mijatovic

回答by Gareth Davis

回答by Ahmed Ahmed

回答by Ravindra babu

相关推荐

最近更新

标签

java 了解 LongWritable

提问by Mijatovic

回答by Gareth Davis

回答by Ahmed Ahmed

回答by Ravindra babu

相关推荐

java NetBeans 制作的 jar 文件不起作用

java Netbeans 中的设计视图无法加载

java 如何处理SQL状态[HY000]；错误代码[1366]；字符串值不正确？

每秒更新 GUI 的 Java 秒表？

相关推荐

最近更新

标签