如何使用Docker设置单个节点Hadoop集群
在本文中,将介绍如何使用Docker设置单个节点Hadoop集群。
在我开始设置之前,让我简要提醒你Docker和Hadoop是什么。
Docker是一个软件容器化平台,我们将应用程序与容器中的所有库,依赖项,环境一起包装。
此容器称为Docker容器。
使用Docker,我们可以在飞行中构建,运行应用程序(软件)。
例如,如果要在Ubuntu系统上测试应用程序,则不需要在笔记本电脑/桌面上设置完整的操作系统,或者使用Ubuntu操作系统启动虚拟机。
这将需要很多时间和空间。
我们可以简单地启动Ubuntu Docker容器,该容器将具有环境,我们需要在飞行中测试应用程序的库。
Apache Hadoop是一个框架,允许跨计算机集群的大数据集的分布式处理。
这几天它是该行业中最重要的技术之一。
现在要使用Hadoop存储和分析大量数据,我们需要设置Hadoop集群。
如果我们之前已经完成了Hadoop集群,我们知道它不是一项简单的任务。
如果我说,建立一个Hadoop集群,几乎没有5-10分钟的工作,你会相信我吗?
我猜不是!
这是Docker进入图片的地方,并且使用Docker我们可以立即设置Hadoop集群。
使用Docker设置Hadoop集群的好处
- 在任何时候都安装并运行Hadoop。
- 根据需要使用资源,因此不受资源的浪费。
- 易于扩展,最适合在Hadoop集群中的测试环境中。
- 不担心Hadoop依赖关系,库 等,Docker会照顾它。
使用Docker设置单个节点Hadoop集群
因此,让我们现在看到如何使用Docker设置单个节点Hadoop集群。
我正在使用Ubuntu 16. 04系统和Docker已在我的系统上安装和配置。
在我使用Docker设置单个节点Hadoop集群之前,让我只是运行简单的示例,以便看到Docker正在我的系统上正常工作。
让我检查一下我现在有ofcker镜像。
hadoop@hadoop-VirtualBox:~$docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
我现在没有任何Docker图像。
让我运行一个简单的Hello-World Docker例子。
hadoop@hadoop-VirtualBox:~$docker run hello-world Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps:
- Docker客户端联系了Docker守护程序。
- Docker守护进程从Docker Hub中拉动了"Hello-World"图像。
- Docker守护程序创建了一个运行该图像的新容器
可执行文件产生我们当前正在读取的输出。
- Docker守护程序流传输到Docker客户端的输出,发送给它
到你的终端。
要尝试更雄心勃勃的内容,可以运行一个Ubuntu容器Docker。
COM有关更多示例和想法,请访问:https://文档。
Docker。
COM/ENGINE/USERGUIDE /
所以现在你知道Docker正在正常工作。
让我们继续前进并在Docker容器中安装Hadoop。
为此,我们需要一个Hadoop Docker图像。
下面的命令将让我成为Hadoop-2.
7. 1Docker图像。
hadoop@hadoop-VirtualBox:~$sudo docker pull sequenceiq/hadoop-docker:2.7.1 [sudo] password for hadoop: 2.7.1: Pulling from sequenceiq/hadoop-docker b253335dcf03: Pull complete a3ed95caeb02: Pull complete 11c8cd810974: Pull complete 49d8575280f2: Pull complete 2240837237fc: Pull complete e727168a1e18: Pull complete ede4c89e7b84: Pull complete a14c58904e3e: Pull complete 8d72113f79e9: Pull complete 44bc7aa001db: Pull complete f1af80e588d1: Pull complete 54a0f749c9e0: Pull complete f620e24d35d5: Pull complete ff68d052eb73: Pull complete d2f5cd8249bc: Pull complete 5d3c1e2c16b1: Pull complete 6e1d5d78f75c: Pull complete a0d5160b2efd: Pull complete b5c5006d9017: Pull complete 6a8c6da42d5b: Pull complete 13d1ee497861: Pull complete e3be4bdd7a5c: Pull complete 391fb9240903: Pull complete Digest: sha256:0ae1419989844ca8b655dea261b92554740ec3c133e0826866c49319af7359db Status: Downloaded newer image for sequenceiq/hadoop-docker:2.7.1
在下面运行命令以检查Hadoop Docket图像是否正确下载。
hadoop@hadoop-VirtualBox:~$docker images REPOSITORY TAG IMAGE ID CREATED SIZE hello-world latest c54a2cc56cbb 5 months ago 1.848 kB sequenceiq/hadoop-docker 2.7.1 e3c6e05ab051 2 years ago 1.516 GB hadoop@hadoop-VirtualBox:~$
现在运行此Docker Image,它将创建一个Hadoop-2的Docker容器。
7. 1将运行。
hadoop@hadoop-VirtualBox:~$docker run -it sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash / Starting sshd: [ OK ] Starting namenodes on [e34a63e1dcf8] e34a63e1dcf8: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-e34a63e1dcf8.out localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-e34a63e1dcf8.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-e34a63e1dcf8.out starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-e34a63e1dcf8.out localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-e34a63e1dcf8.out
现在Docker容器已启动,请运行JPS命令查看Hadoop服务是否已启动和运行。
bash-4.1# jps 291 SecondaryNameNode 560 NodeManager 856 Jps 107 NameNode 483 ResourceManager 180 DataNode bash-4.1#
打开一个新终端并运行以下命令,以查看正在运行的容器列表及其详细信息。
hadoop@hadoop-VirtualBox:~$docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e34a63e1dcf8 sequenceiq/hadoop-docker:2.7.1 "/etc/bootstrap.sh -b" 44 minutes ago Up 44 minutes 22/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp condescending_poincare
返回Docker Container终端,然后运行以下命令以获取Docker容器的IP地址。
bash-4.1# ifconfig eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02 inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0 inet6 addr: fe80::42:acff:fe11:2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:56 errors:0 dropped:0 overruns:0 frame:0 TX packets:31 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:6803 (6.6 KiB) TX bytes:2298 (2.2 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:28648 errors:0 dropped:0 overruns:0 frame:0 TX packets:28648 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:4079499 (3.8 MiB) TX bytes:4079499 (3.8 MiB) bash-4.1#
在运行JPS命令后,我们已经看到所有服务都在运行,现在让我们在浏览器上检查Namenode UI。
转到172. 17. 0. 2:50070在浏览器中,在那里,在Docker容器中运行的Hadoop集群的Namenode UI。
只是为了确保Hadoop集群正常工作,让我们在Docker容器中运行Hadoop MapReduce示例。
bash-4.1# cd $HADOOP_PREFIX bash-4.1# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+' 16/11/29 13:07:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/11/29 13:07:07 WARN mapreduce.JobSubmitter: No job jar file set. User classes Jan not be found. See Job or Job#setJar(String). 16/11/29 13:07:08 INFO input.FileInputFormat: Total input paths to process : 27 16/11/29 13:07:10 INFO mapreduce.JobSubmitter: number of splits:27 16/11/29 13:07:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480434980067_0001 16/11/29 13:07:14 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources. 16/11/29 13:07:15 INFO impl.YarnClientImpl: Submitted application application_1480434980067_0001 16/11/29 13:07:16 INFO mapreduce.Job: The url to track the job: http://e34a63e1dcf8:8088/proxy/application_1480434980067_0001/ 16/11/29 13:07:16 INFO mapreduce.Job: Running job: job_1480434980067_0001 16/11/29 13:07:58 INFO mapreduce.Job: Job job_1480434980067_0001 running in uber mode : false 16/11/29 13:07:58 INFO mapreduce.Job: map 0% reduce 0% 16/11/29 13:10:44 INFO mapreduce.Job: map 22% reduce 0% 16/11/29 13:13:40 INFO mapreduce.Job: map 22% reduce 7% 16/11/29 13:13:41 INFO mapreduce.Job: map 26% reduce 7% 16/11/29 13:20:30 INFO mapreduce.Job: map 96% reduce 32% 16/11/29 13:21:01 INFO mapreduce.Job: map 100% reduce 32% 16/11/29 13:21:04 INFO mapreduce.Job: map 100% reduce 100% 16/11/29 13:21:08 INFO mapreduce.Job: Job job_1480434980067_0001 completed successfully 16/11/29 13:21:10 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=345 FILE: Number of bytes written=2621664 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=64780 HDFS: Number of bytes written=437 HDFS: Number of read operations=84 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Launched map tasks=29 Launched reduce tasks=1 Data-local map tasks=29 Map-Reduce Framework Map input records=1586 Map output records=24 Bytes Written=437 16/11/29 13:21:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 16/11/29 13:21:10 WARN mapreduce.JobSubmitter: No job jar file set. User classes Jan not be found. See Job or Job#setJar(String). 16/11/29 13:21:10 INFO input.FileInputFormat: Total input paths to process : 1 16/11/29 13:21:12 INFO mapreduce.JobSubmitter: number of splits:1 16/11/29 13:21:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480434980067_0002 16/11/29 13:21:13 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources. 16/11/29 13:21:14 INFO impl.YarnClientImpl: Submitted application application_1480434980067_0002 16/11/29 13:21:14 INFO mapreduce.Job: The url to track the job: http://e34a63e1dcf8:8088/proxy/application_1480434980067_0002/ 16/11/29 13:21:14 INFO mapreduce.Job: Running job: job_1480434980067_0002 16/11/29 13:21:48 INFO mapreduce.Job: Job job_1480434980067_0002 running in uber mode : false 16/11/29 13:21:48 INFO mapreduce.Job: map 0% reduce 0% 16/11/29 13:22:12 INFO mapreduce.Job: map 100% reduce 0% 16/11/29 13:22:37 INFO mapreduce.Job: map 100% reduce 100% 16/11/29 13:22:38 INFO mapreduce.Job: Job job_1480434980067_0002 completed successfully 16/11/29 13:22:38 INFO mapreduce.Job: Counters: 49 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Map-Reduce Framework Map input records=11 Map output records=11 Map output bytes=263 Map output materialized bytes=291 Input split bytes=132 Physical memory (bytes) snapshot=334082048 Virtual memory (bytes) snapshot=1297162240 Total committed heap usage (bytes)=209518592 File Input Format Counters Bytes Read=437 File Output Format Counters Bytes Written=197 bash-4.1#
检查输出。
bash-4.1# bin/hdfs dfs -cat output/* 6 dfs.audit.logger 4 dfs.class 3 dfs.server.namenode. 2 dfs.period 2 dfs.audit.log.maxfilesize 2 dfs.audit.log.maxbackupindex 1 dfsmetrics.log 1 dfsadmin 1 dfs.servers 1 dfs.replication 1 dfs.file bash-4.1#