如何使用Docker设置单个节点Hadoop集群

时间:2020-03-05 15:25:27  来源:igfitidea点击:

在本文中,将介绍如何使用Docker设置单个节点Hadoop集群。
在我开始设置之前,让我简要提醒你Docker和Hadoop是什么。

Docker是一个软件容器化平台,我们将应用程序与容器中的所有库,依赖项,环境一起包装。
此容器称为Docker容器。
使用Docker,我们可以在飞行中构建,运行应用程序(软件)。

例如,如果要在Ubuntu系统上测试应用程序,则不需要在笔记本电脑/桌面上设置完整的操作系统,或者使用Ubuntu操作系统启动虚拟机。
这将需要很多时间和空间。
我们可以简单地启动Ubuntu Docker容器,该容器将具有环境,我们需要在飞行中测试应用程序的库。

Apache Hadoop是一个框架,允许跨计算机集群的大数据集的分布式处理。
这几天它是该行业中最重要的技术之一。
现在要使用Hadoop存储和分析大量数据,我们需要设置Hadoop集群。
如果我们之前已经完成了Hadoop集群,我们知道它不是一项简单的任务。

如果我说,建立一个Hadoop集群,几乎没有5-10分钟的工作,你会相信我吗?
我猜不是!

这是Docker进入图片的地方,并且使用Docker我们可以立即设置Hadoop集群。

使用Docker设置Hadoop集群的好处

  • 在任何时候都安装并运行Hadoop。
  • 根据需要使用资源,因此不受资源的浪费。
  • 易于扩展,最适合在Hadoop集群中的测试环境中。
  • 不担心Hadoop依赖关系,库 等,Docker会照顾它。

使用Docker设置单个节点Hadoop集群

因此,让我们现在看到如何使用Docker设置单个节点Hadoop集群。
我正在使用Ubuntu 16. 04系统和Docker已在我的系统上安装和配置。

在我使用Docker设置单个节点Hadoop集群之前,让我只是运行简单的示例,以便看到Docker正在我的系统上正常工作。

让我检查一下我现在有ofcker镜像。

hadoop@hadoop-VirtualBox:~$docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

我现在没有任何Docker图像。
让我运行一个简单的Hello-World Docker例子。

hadoop@hadoop-VirtualBox:~$docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
  • Docker客户端联系了Docker守护程序。
  • Docker守护进程从Docker Hub中拉动了"Hello-World"图像。
  • Docker守护程序创建了一个运行该图像的新容器

可执行文件产生我们当前正在读取的输出。

  • Docker守护程序流传输到Docker客户端的输出,发送给它

到你的终端。
要尝试更雄心勃勃的内容,可以运行一个Ubuntu容器Docker。
COM有关更多示例和想法,请访问:https://文档。
Docker。
COM/ENGINE/USERGUIDE /

所以现在你知道Docker正在正常工作。
让我们继续前进并在Docker容器中安装Hadoop。
为此,我们需要一个Hadoop Docker图像。
下面的命令将让我成为Hadoop-2.
7. 1Docker图像。

hadoop@hadoop-VirtualBox:~$sudo docker pull sequenceiq/hadoop-docker:2.7.1
[sudo] password for hadoop:
2.7.1: Pulling from sequenceiq/hadoop-docker
b253335dcf03: Pull complete
a3ed95caeb02: Pull complete
11c8cd810974: Pull complete
49d8575280f2: Pull complete
2240837237fc: Pull complete
e727168a1e18: Pull complete
ede4c89e7b84: Pull complete
a14c58904e3e: Pull complete
8d72113f79e9: Pull complete
44bc7aa001db: Pull complete
f1af80e588d1: Pull complete
54a0f749c9e0: Pull complete
f620e24d35d5: Pull complete
ff68d052eb73: Pull complete
d2f5cd8249bc: Pull complete
5d3c1e2c16b1: Pull complete
6e1d5d78f75c: Pull complete
a0d5160b2efd: Pull complete
b5c5006d9017: Pull complete
6a8c6da42d5b: Pull complete
13d1ee497861: Pull complete
e3be4bdd7a5c: Pull complete
391fb9240903: Pull complete
Digest: sha256:0ae1419989844ca8b655dea261b92554740ec3c133e0826866c49319af7359db
Status: Downloaded newer image for sequenceiq/hadoop-docker:2.7.1

在下面运行命令以检查Hadoop Docket图像是否正确下载。

hadoop@hadoop-VirtualBox:~$docker images
REPOSITORY                 TAG                 IMAGE ID            CREATED             SIZE
hello-world                latest              c54a2cc56cbb        5 months ago        1.848 kB
sequenceiq/hadoop-docker   2.7.1               e3c6e05ab051        2 years ago         1.516 GB
hadoop@hadoop-VirtualBox:~$

现在运行此Docker Image,它将创建一个Hadoop-2的Docker容器。
7. 1将运行。

hadoop@hadoop-VirtualBox:~$docker run -it sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash
/
Starting sshd:                                             [  OK  ]
Starting namenodes on [e34a63e1dcf8]
e34a63e1dcf8: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-e34a63e1dcf8.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-e34a63e1dcf8.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-e34a63e1dcf8.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-e34a63e1dcf8.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-e34a63e1dcf8.out

现在Docker容器已启动,请运行JPS命令查看Hadoop服务是否已启动和运行。

bash-4.1# jps
291 SecondaryNameNode
560 NodeManager
856 Jps
107 NameNode
483 ResourceManager
180 DataNode
bash-4.1#

打开一个新终端并运行以下命令,以查看正在运行的容器列表及其详细信息。

hadoop@hadoop-VirtualBox:~$docker ps
CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS              PORTS                                                                                                                   NAMES
e34a63e1dcf8        sequenceiq/hadoop-docker:2.7.1   "/etc/bootstrap.sh -b"   44 minutes ago      Up 44 minutes       22/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp   condescending_poincare

返回Docker Container终端,然后运行以下命令以获取Docker容器的IP地址。

bash-4.1# ifconfig
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:00:02
inet addr:172.17.0.2  Bcast:0.0.0.0  Mask:255.255.0.0
inet6 addr: fe80::42:acff:fe11:2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
RX packets:56 errors:0 dropped:0 overruns:0 frame:0
TX packets:31 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6803 (6.6 KiB)  TX bytes:2298 (2.2 KiB)
 
lo        Link encap:Local Loopback
inet addr:127.0.0.1  Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING  MTU:65536  Metric:1
RX packets:28648 errors:0 dropped:0 overruns:0 frame:0
TX packets:28648 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:4079499 (3.8 MiB)  TX bytes:4079499 (3.8 MiB)
bash-4.1#

在运行JPS命令后,我们已经看到所有服务都在运行,现在让我们在浏览器上检查Namenode UI。
转到172. 17. 0. 2:50070在浏览器中,在那里,在Docker容器中运行的Hadoop集群的Namenode UI。

只是为了确保Hadoop集群正常工作,让我们在Docker容器中运行Hadoop MapReduce示例。

bash-4.1# cd $HADOOP_PREFIX
bash-4.1# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'
16/11/29 13:07:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/11/29 13:07:07 WARN mapreduce.JobSubmitter: No job jar file set.  User classes Jan not be found. See Job or Job#setJar(String).
16/11/29 13:07:08 INFO input.FileInputFormat: Total input paths to process : 27
16/11/29 13:07:10 INFO mapreduce.JobSubmitter: number of splits:27
16/11/29 13:07:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480434980067_0001
16/11/29 13:07:14 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/11/29 13:07:15 INFO impl.YarnClientImpl: Submitted application application_1480434980067_0001
16/11/29 13:07:16 INFO mapreduce.Job: The url to track the job: http://e34a63e1dcf8:8088/proxy/application_1480434980067_0001/
16/11/29 13:07:16 INFO mapreduce.Job: Running job: job_1480434980067_0001
16/11/29 13:07:58 INFO mapreduce.Job: Job job_1480434980067_0001 running in uber mode : false
16/11/29 13:07:58 INFO mapreduce.Job:  map 0% reduce 0%
16/11/29 13:10:44 INFO mapreduce.Job:  map 22% reduce 0%
16/11/29 13:13:40 INFO mapreduce.Job:  map 22% reduce 7%
16/11/29 13:13:41 INFO mapreduce.Job:  map 26% reduce 7%
16/11/29 13:20:30 INFO mapreduce.Job:  map 96% reduce 32%
16/11/29 13:21:01 INFO mapreduce.Job:  map 100% reduce 32%
16/11/29 13:21:04 INFO mapreduce.Job:  map 100% reduce 100%
16/11/29 13:21:08 INFO mapreduce.Job: Job job_1480434980067_0001 completed successfully
16/11/29 13:21:10 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=345
FILE: Number of bytes written=2621664
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=64780
HDFS: Number of bytes written=437
HDFS: Number of read operations=84
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Launched map tasks=29
Launched reduce tasks=1
Data-local map tasks=29
Map-Reduce Framework
Map input records=1586
Map output records=24
Bytes Written=437
16/11/29 13:21:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/11/29 13:21:10 WARN mapreduce.JobSubmitter: No job jar file set.  User classes Jan not be found. See Job or Job#setJar(String).
16/11/29 13:21:10 INFO input.FileInputFormat: Total input paths to process : 1
16/11/29 13:21:12 INFO mapreduce.JobSubmitter: number of splits:1
16/11/29 13:21:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480434980067_0002
16/11/29 13:21:13 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/11/29 13:21:14 INFO impl.YarnClientImpl: Submitted application application_1480434980067_0002
16/11/29 13:21:14 INFO mapreduce.Job: The url to track the job: http://e34a63e1dcf8:8088/proxy/application_1480434980067_0002/
16/11/29 13:21:14 INFO mapreduce.Job: Running job: job_1480434980067_0002
16/11/29 13:21:48 INFO mapreduce.Job: Job job_1480434980067_0002 running in uber mode : false
16/11/29 13:21:48 INFO mapreduce.Job:  map 0% reduce 0%
16/11/29 13:22:12 INFO mapreduce.Job:  map 100% reduce 0%
16/11/29 13:22:37 INFO mapreduce.Job:  map 100% reduce 100%
16/11/29 13:22:38 INFO mapreduce.Job: Job job_1480434980067_0002 completed successfully
16/11/29 13:22:38 INFO mapreduce.Job: Counters: 49
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Map-Reduce Framework
Map input records=11
Map output records=11
Map output bytes=263
Map output materialized bytes=291
Input split bytes=132
Physical memory (bytes) snapshot=334082048
Virtual memory (bytes) snapshot=1297162240
Total committed heap usage (bytes)=209518592
File Input Format Counters
Bytes Read=437
File Output Format Counters
Bytes Written=197
bash-4.1#

检查输出。

bash-4.1# bin/hdfs dfs -cat output/*
6             dfs.audit.logger
4             dfs.class
3             dfs.server.namenode.
2             dfs.period
2             dfs.audit.log.maxfilesize
2             dfs.audit.log.maxbackupindex
1             dfsmetrics.log
1             dfsadmin
1             dfs.servers
1             dfs.replication
1             dfs.file
bash-4.1#