如何在Ubuntu 16.04上运行Hadoop MapReduce程序
在此教程 中,将介绍如何运行MapReduce程序。
MapReduce是Apache Hadoop的核心部分之一,它是Apache Hadoop的处理层。
因此,在我向我们展示如何运行MapReduce程序之前,请告诉我简要解释MapReduce。
MapReduce是一个用于并行处理大数据集的系统。
MapReduce将数据减少到结果并创建数据摘要。
MapReduce程序有两个映射器和减速机。
映射完成后的工作后,只有减速器启动。
映射器:它将输入键/值对映射到一组中间键/值对。
Reducer:它减少了一组中间值,该值与较小的值共享密钥。
基本上,在WordCount MapReduce程序中,我们提供输入文件的任何文本文件,如输入。
当MapReduce程序开始时,以下是它通过的进程:
拆分:将输入文件中的每一行拆分为单词。
映射:它形成一个键值对,其中Word是键,1是分配给每个键的值。
Shuffling:常见的键值对一起分组。
减少:相似密钥的值将加在一起。
运行mapreduce程序
MapReduce程序是用Java编写的。
大多数Eclipse IDE用于由开发人员编程。
所以在这个教程 中,将介绍如何将MapReduce程序从Eclipse IDE导出到JAR文件中,并在Hadoop集群上运行它。
我的mapreduce程序在我的Eclipse IDE中。
现在要在Hadoop集群上运行此MapReduce程序,我们会将项目导出为JAR文件。
在Eclipse IDE中选择"文件"选项,然后单击"导出"。
在Java选项中,选择JAR文件,然后单击"下一步"。
选择WordCount项目,并为JAR文件提供路径和名称,我保留它WordCount。
jar,点击下一步两次。
现在单击"浏览"并选择主类,最后单击"完成"以使JAR文件。
如果我们收到以下任何警告,只需单击"确定"。
检查Hadoop集群是否已启动和工作。
命令:JPS.
hadoop@hadoop-VirtualBox:~$jps 3008 NodeManager 3924 Jps 2885 ResourceManager 2505 DataNode 3082 JobHistoryServer 2716 SecondaryNameNode 2383 NameNode hadoop@hadoop-VirtualBox:~$
我们将输入文件与WordCount程序的HDFS相关联。
hadoop@hadoop-VirtualBox:~$hdfs dfs -put input / hadoop@hadoop-VirtualBox:~$hdfs dfs -cat /input This is my first mapreduce test This is wordcount program hadoop@hadoop-VirtualBox:~$
现在运行wordcount。
jar文件使用以下命令。
注意:由于我们在导出WordCount时选择了主类。
jar,所以没有必要在命令中提到主类。
命令:hadoop jar wordcount。
jar /输入/输出
hadoop@hadoop-VirtualBox:~$hadoop jar wordcount.jar /input /output
16/11/27 22:52:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:
8032
16/11/27 22:52:22 WARN mapreduce.JobResourceUploader: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your application
with ToolRunner to remedy this.
16/11/27 22:52:27 INFO input.FileInputFormat: Total input paths to process : 1
16/11/27 22:52:28 INFO mapreduce.JobSubmitter: number of splits:1
16/11/27 22:52:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14802
67251741_0001
16/11/27 22:52:32 INFO impl.YarnClientImpl: Submitted application application_14802
67251741_0001
16/11/27 22:52:33 INFO mapreduce.Job: The url to track the job: http://hadoop-Virtu
alBox:8088/proxy/application_1480267251741_0001/
16/11/27 22:52:33 INFO mapreduce.Job: Running job: job_1480267251741_0001
16/11/27 22:53:20 INFO mapreduce.Job: Job job_1480267251741_0001 running in uber mo
de : false
16/11/27 22:53:20 INFO mapreduce.Job: map 0% reduce 0%
16/11/27 22:53:45 INFO mapreduce.Job: map 100% reduce 0%
16/11/27 22:54:13 INFO mapreduce.Job: map 100% reduce 100%
16/11/27 22:54:15 INFO mapreduce.Job: Job job_1480267251741_0001 completed
successfully
16/11/27 22:54:16 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=124
FILE: Number of bytes written=237911
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=150
HDFS: Number of bytes written=66
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=21062
Total time spent by all reduces in occupied slots (ms)=25271
Total time spent by all map tasks (ms)=21062
Total time spent by all reduce tasks (ms)=25271
Total vcore-milliseconds taken by all map tasks=21062
Total vcore-milliseconds taken by all reduce tasks=25271
Total megabyte-milliseconds taken by all map tasks=21567488
Total megabyte-milliseconds taken by all reduce tasks=25877504
Map-Reduce Framework
Map input records=2
Map output records=10
Map output bytes=98
Map output materialized bytes=124
Input split bytes=92
Combine input records=0
Combine output records=0
Reduce input groups=8
Reduce shuffle bytes=124
Reduce input records=10
Reduce output records=8
Spilled Records=20
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=564
CPU time spent (ms)=4300
Physical memory (bytes) snapshot=330784768
Virtual memory (bytes) snapshot=3804205056
Total committed heap usage (bytes)=211812352
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=58
File Output Format Counters
Bytes Written=66
hadoop@hadoop-VirtualBox:~$
程序成功运行后,转到HDFS并检查输出目录中的零件文件。
以下是WordCount程序的输出。
hadoop@hadoop-VirtualBox:~$hdfs dfs -cat /output/part-r-00000 This 2 first 1 is 2 mapreduce 1 my 1 program 1 test 1 wordcount 1 hadoop@hadoop-VirtualBox:~$

