在 Windows 上运行 Apache Hadoop 2.1.0

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18630019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 10:50:44  来源:igfitidea点击:

Running Apache Hadoop 2.1.0 on Windows

windowshadoop

提问by Hatter

I am new to Hadoop and have run into problems trying to run it on my Windows 7 machine. Particularly I am interested in running Hadoop 2.1.0 as its release notesmention that running on Windows is supported. I know that I can try to run 1.x versions on Windows with Cygwin or even use prepared VM by for example Cloudera, but these options are in some reasons less convenient for me.

我是 Hadoop 的新手,在尝试在我的 Windows 7 机器上运行它时遇到了问题。特别是我对运行 Hadoop 2.1.0 感兴趣,因为它的发行说明提到支持在 Windows 上运行。我知道我可以尝试使用 Cygwin 在 Windows 上运行 1.x 版本,甚至可以使用准备好的 VM,例如 Cloudera,但由于某些原因,这些选项对我来说不太方便。

Having examined a tarball from http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-2.1.0-beta/I found that there really are some *.cmd scripts that can be run without Cygwin. Everything worked fine when I formated HDFS partition but when I tried to run hdfs namenode daemon I faced two errors: first, non fatal, was that winutils.exe could not be found (it really wasn't present in the tarball downloaded). I found the sources of this component in the Apache Hadoop sources tree and compiled it with Microsoft SDK and MSbuild. Thanks to detailed error message it was clear where to put the executable to satisfy Hadoop. But the second error which is fatal doesn't contain enough information for me to solve:

检查了来自http://apache-mirror.rbc.ru/pub/apache/hadoop/common/hadoop-2.1.0-beta/的 tarball我发现确实有一些 *.cmd 脚本可以在没有 Cygwin 的情况下运行. 当我格式化 HDFS 分区时一切正常,但是当我尝试运行 hdfs namenode 守护程序时,我遇到了两个错误:首先,非致命错误,是找不到 winutils.exe(它确实不存在于下载的 tarball 中)。我在 Apache Hadoop 源代码树中找到了该组件的源代码,并使用 Microsoft SDK 和 MSbuild 对其进行编译。多亏了详细的错误消息,很清楚将可执行文件放在哪里才能满足 Hadoop。但是第二个致命错误没有包含足够的信息让我解决:

13/09/05 10:20:09 FATAL namenode.NameNode: Exception in namenode join
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:423)
    at org.apache.hadoop.fs.FileUtil.canWrite(FileUtil.java:952)
    at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:451)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:282)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:200)
...
13/09/05 10:20:09 INFO util.ExitUtil: Exiting with status 1

Looks like something else should be compiled. I'm going to try to build Hadoop from the source with Maven but isn't there a simpler way? Isn't there some option-I-know-not-of that can disable native code and make that tarball usable on Windows?

看起来应该编译其他东西。我将尝试使用 Maven 从源代码构建 Hadoop,但没有更简单的方法吗?是不是有一些我不知道的选项可以禁用本机代码并使该 tarball 在 Windows 上可用?

Thank you.

谢谢你。

UPDATED. Yes, indeed. "Homebrew" package contained some extra files, most importantly winutils.exe and hadoop.dll. With this files namenode and datanode started successfully. I think the question can be closed. I didn't delete it in case someone face the same difficulty.

更新。确实是的。“Homebrew”包包含一些额外的文件,最重要的是 winutils.exe 和 hadoop.dll。有了这个文件 namenode 和 datanode 成功启动。我认为这个问题可以结束。我没有删除它,以防有人遇到同样的困难。

UPDATED 2. To build the "homebrew" package I did the following:

更新 2. 要构建“自制软件”包,我执行了以下操作:

  1. Got sources, and unpacked them.
  2. Read carefully BUILDING.txt.
  3. Installed dependencies:
    3a) Windows SDK 7.1
    3b) Maven (I used 3.0.5) 3c) JDK (I used 1.7.25)
    3d) ProtocolBuffer (I used 2.5.0 - http://protobuf.googlecode.com/files/protoc-2.5.0-win32.zip). It is enough just to put compiler (protoc.exe) into some of the PATH folders.
    3e) A set of UNIX command line tools (I installed Cygwin)
  4. Started command line of Windows SDK. Start | All programs | Microsoft Windows SDK v7.1 | ... Command Prompt (I modified this shortcut, adding option /release in the command line to build release versions of native code). All the next steps are made from inside SDK command line window)
  5. Set up the environment:

    set JAVA_HOME={path_to_JDK_root}

  1. 得到了来源,并解压了它们。
  2. 仔细阅读 BUILDING.txt。
  3. 已安装的依赖项:
    3a) Windows SDK 7.1
    3b) Maven(我使用 3.0.5) 3c)JDK(我使用 1.7.25)
    3d)ProtocolBuffer(我使用 2.5.0 - http://protobuf.googlecode.com/files/ protoc-2.5.0-win32.zip)。将编译器 (protoc.exe) 放入某些 PATH 文件夹就足够了。
    3e) 一套UNIX命令行工具(我安装的是Cygwin)
  4. 启动 Windows SDK 的命令行。开始 | 所有程序 | 微软视窗 SDK v7.1 | ... 命令提示符(我修改了这个快捷方式,在命令行中添加选项 /release 以构建本机代码的发布版本)。所有接下来的步骤都是在 SDK 命令行窗口中完成的)
  5. 设置环境:

    设置 JAVA_HOME={path_to_JDK_root}

It seems that JAVA_HOME MUST NOTcontain space!

似乎 JAVA_HOME不能包含空格!

set PATH={path_to_maven_bin};%PATH%  
set Platform=x64  
set PATH={path_to_cygwin_bin};%PATH%  
set PATH={path_to_protoc.exe};%PATH%  
  1. Changed dir to sources root folder (BUILDING.txt warns that there are some limitations on the path length so sources root should have short name - I used D:\hds)
  2. Ran building process:

    mvn package -Pdist -DskipTests

  1. 将目录更改为源根文件夹(BUILDING.txt 警告路径长度有一些限制,因此源根应具有短名称 - 我使用 D:\hds)
  2. 冉建设过程:

    mvn 包 -Pdist -DskipTests

You can try without 'skipTests' but on my machine some tests failed and building was terminated. It may be connected to sybolic link issues mentioned in BUILDING .txt. 8. Picked the result in hadoop-dist\target\hadoop-2.1.0-beta (windows executables and dlls are in 'bin' folder)

您可以尝试不使用“skipTests”,但在我的机器上,一些测试失败并且构建终止。它可能与 BUILDING .txt 中提到的符号链接问题有关。8. 在 hadoop-dist\target\hadoop-2.1.0-beta 中选取结果(windows 可执行文件和 dll 在 'bin' 文件夹中)

回答by Abhijit

I have followed following steps to install Hadoop 2.2.0

我已按照以下步骤安装 Hadoop 2.2.0

Steps to build Hadoop bin distribution for Windows

为 Windows 构建 Hadoop bin 发行版的步骤

  1. Download and install Microsoft Windows SDK v7.1.

  2. Download and install Unix command-line tool Cygwin.

  3. Download and install Maven 3.1.1.

  4. Download Protocol Buffers 2.5.0 and extract to a folder (say c:\protobuf).

  5. Add Environment Variables JAVA_HOME, M2_HOME and Platform if not added already. Note : Variable name Platform is case sensitive. And value will be either x64 or Win32 for building on a 64-bit or 32-bit system. Edit Path Variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say C:\maven\bin) and installation path of Protocol Buffers (say c:\protobuf).

  6. Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (say c:\hdfs) to avoid runtime problem due to maximum path length limitation in Windows.

  7. Select Start --> All Programs --> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command Prompt. Change directory to Hadoop source code folder (c:\hdfs). Execute mvn package with options -Pdist,native-win -DskipTests -Dtar to create Windows binary tar distribution.

  8. If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdfs\hadoop-dist\target\hadoop-2.2.0 directory.

  1. 下载并安装 Microsoft Windows SDK v7.1。

  2. 下载并安装 Unix 命令行工具 Cygwin。

  3. 下载并安装 Maven 3.1.1。

  4. 下载 Protocol Buffers 2.5.0 并解压到一个文件夹(比如 c:\protobuf)。

  5. 如果尚未添加,请添加环境变量 JAVA_HOME、M2_HOME 和平台。注意:变量名平台区分大小写。对于在 64 位或 32 位系统上构建,值将是 x64 或 Win32。编辑路径变量,添加 Cygwin 的 bin 目录(如 C:\cygwin64\bin)、Maven 的 bin 目录(如 C:\maven\bin)和 Protocol Buffers 的安装路径(如 c:\protobuf)。

  6. 下载 hadoop-2.2.0-src.tar.gz 并解压到具有短路径的文件夹(例如 c:\hdfs),以避免由于 Windows 中的最大路径长度限制而导致的运行时问题。

  7. 选择开始 --> 所有程序 --> Microsoft Windows SDK v7.1 并打开 Windows SDK 7.1 命令提示符。将目录更改为 Hadoop 源代码文件夹 (c:\hdfs)。使用选项 -Pdist,native-win -DskipTests -Dtar 执行 mvn package 以创建 Windows 二进制 tar 发行版。

  8. 如果上一步一切顺利,那么将在 C:\hdfs\hadoop-dist\target\hadoop-2.2.0 目录中创建原生发行版 hadoop-2.2.0.tar.gz。

Install Hadoop

安装 Hadoop

  1. Extract hadoop-2.2.0.tar.gz to a folder (say c:\hadoop).

  2. Add Environment Variable HADOOP_HOME and edit Path Variable to add bin directory of HADOOP_HOME (say C:\hadoop\bin).

  1. 将 hadoop-2.2.0.tar.gz 解压到一个文件夹(比如 c:\hadoop)。

  2. 添加环境变量 HADOOP_HOME 并编辑路径变量以添加 HADOOP_HOME 的 bin 目录(例如 C:\hadoop\bin)。

Configure Hadoop

配置 Hadoop

C:\hadoop\etc\hadoop\core-site.xml

C:\hadoop\etc\hadoop\core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\hdfs-site.xml

C:\hadoop\etc\hadoop\hdfs-site.xml

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/hadoop/data/dfs/namenode</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/hadoop/data/dfs/datanode</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\mapred-site.xml

C:\hadoop\etc\hadoop\mapred-site.xml

<configuration>
        <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
        </property>
</configuration>

C:\hadoop\etc\hadoop\ yarn-site.xml

C:\hadoop\etc\hadoop\yarn-site.xml

<configuration>
        <property>
           <name>yarn.nodemanager.aux-services</name>
           <value>mapreduce_shuffle</value>
        </property>
        <property>
           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
</configuration>

Format namenode

格式化名称节点

For the first time only, namenode needs to be formatted.

仅第一次,namenode 需要格式化。

C:\Users\abhijitg>cd c:\hadoop\bin 
c:\hadoop\bin>hdfs namenode –format

Start HDFS (Namenode and Datanode)

启动 HDFS(Namenode 和 Datanode)

C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>start-dfs

Start MapReduce aka YARN (Resource Manager and Node Manager)

启动 MapReduce 又名 YARN(资源管理器和节点管理器)

C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>start-yarn
starting yarn daemons

Total four separate Command Prompt windows will be opened automatically to run Namenode, Datanode, Resource Manager, Node Manager

总共四个单独的命令提示符窗口将自动打开以运行Namenode、Datanode、Resource Manager、Node Manager

Reference : Build, Install, Configure and Run Apache Hadoop 2.2.0 in Microsoft Windows OS

参考:在 Microsoft Windows 操作系统中构建、安装、配置和运行 Apache Hadoop 2.2.0

回答by Peter Kofler

Han has prepared the Hadoop 2.2 Windows x64 binaries (see his blog) and uploaded them to Github.

Han 准备了 Hadoop 2.2 Windows x64 二进制文件(参见他的博客)并将它们上传到 Github

After putting the two binaries winutils.exeand hadoop.dllinto the %hadoop_prefix%\binfolder, I got the same UnsatisfiedLinkError.

把两个二进制文件后winutils.exe,并hadoop.dll%hadoop_prefix%\bin文件夹中,我得到了相同的UnsatisfiedLinkError

The problem was that some dependency of hadoop.dllwas missing. I used Dependency Walkerto check the dependencies of the binaries and the Microsoft Visual C++ 2010 Redistributableswere missing.

问题hadoop.dll是缺少某些依赖项。我使用Dependency Walker检查二进制文件的依赖项,并且缺少Microsoft Visual C++ 2010 Redistributables

So besides building all the components yourself, the answer to the problem is

所以除了自己构建所有组件之外,问题的答案是

  • make sure to use the same architecture for Java and the native code. java -versiontells you if you use 32 or x64.
  • then use Dependency Walker to make sure all native binaries are pure and of the same architecture. Sometimes a x64 dependency is missing and Windows falls back to x86, which does not work. See answer of another question.
  • also check if all dependencies of the native binaries are satisfied.
  • 确保对 Java 和本机代码使用相同的架构。java -version告诉您是使用 32 还是 x64。
  • 然后使用 Dependency Walker 来确保所有原生二进制文件都是纯的并且具有相同的架构。有时缺少 x64 依赖项,Windows 回退到 x86,这不起作用。请参阅另一个问题的答案
  • 还检查是否满足本机二进制文件的所有依赖项。

回答by Aleksei Egorov

I had the same problem but with recent hadoop v. 2.2.0. Here are my steps for solving that problem:

我有同样的问题,但最近的 hadoop v. 2.2.0。以下是我解决该问题的步骤:

  1. I've built winutils.exefrom sources. Project directory:

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\src\main\winutils

    My OS: Windows 7. Tool for building: MS Visual Studio Express 2013 for Windows Desktop (it's free and can be loaded from http://www.microsoft.com/visualstudio/). Open Studio, File -> Open -> winutils.sln. Right click on solution on the right side -> Build. There were a couple errors in my case (you might need to fix project properties, specify output folder). Viola! You get winutils.exe- put it into hadoop's bin.

  2. Next we need to build hadoop.dll. Some woodoo magic here goes: open

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\src\main\native\native.sln

    in MS VS; right click on solution -> build. I got a bunch of errors. I created manually several missed header files (don't ask me why they are missed in source tarball!):

    https://github.com/jerishsd/hadoop-experiments/tree/master/sources

    (and don't ask me what this project on git is for! I don't know - google pointed it out by searching header file names) I've copied

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target\winutils\Debug\libwinutils.lib

    (result of step # 1) into

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target\bin

    And finally build operation produces hadoop.dll! Put it again into hadoop's bin and happily run namenode!

  1. 我是winutils.exe从源代码构建的。项目目录:

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\src\main\winutils

    我的操作系统:Windows 7。构建工具:用于 Windows 桌面的 MS Visual Studio Express 2013(它是免费的,可以从http://www.microsoft.com/visualstudio/加载)。打开工作室,File -> Open -> winutils.sln。右键单击右侧的解决方案 -> Build。在我的情况下有几个错误(您可能需要修复项目属性,指定输出文件夹)。中提琴!你得到winutils.exe- 把它放到 hadoop 的 bin 里。

  2. 接下来我们需要构建hadoop.dll. 这里有一些木头魔法:打开

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\src\main\native\native.sln

    在 MS VS 中;右键单击解决方案-> 构建。我有一堆错误。我手动创建了几个丢失的头文件(不要问我为什么在源 tarball 中丢失了它们!):

    https://github.com/jerishsd/hadoop-experiments/tree/master/sources

    (不要问我 git 上的这个项目是干什么的!我不知道 - 谷歌通过搜索头文件名指出了它)我已经复制了

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target\winutils\Debug\libwinutils.lib

    (步骤#1的结果)进入

    hadoop-2.2.0-src\hadoop-common-project\hadoop-common\target\bin

    最后构建操作生成hadoop.dll!再次将其放入 hadoop 的 bin 并愉快地运行 namenode!

Hope my steps will help somebody.

希望我的步骤会对某人有所帮助。

回答by Prasad D

In addition to other solutions, hereis a pre-built copy of winutil.exe. Donload it and add to $HADOOP_HOME/bin. It works for me.

除了其他解决方案,这里还有一个预先构建的 winutil.exe 副本。下载并添加到 $HADOOP_HOME/bin。这个对我有用。

(Source :Click here)

(来源:点击这里

回答by futuredaemon

Please add hadoop.dll (version sensitive) to the system32 directory under Windows Directory.

请将hadoop.dll(版本敏感)添加到Windows目录下的system32目录下。

You can get the hadoop.dll at winutils

你可以在 winutils 上获取 hadoop.dll

回答by Vikash Pareek

You might need to copy hadoop.dlland winutils.exefiles from hadoop-common-bin to %HADOOP_HOME%\bin Add %HADOOP_HOME%/bin to your %PATH% variable.

您可能需要将hadoop.dllwinutils.exe文件从 hadoop-common-bin复制到 %HADOOP_HOME%\bin 将 %HADOOP_HOME%/bin 添加到您的 %PATH% 变量。

You can download hadoop-common from https://github.com/amihalik/hadoop-common-2.6.0-bin

您可以从https://github.com/amihalik/hadoop-common-2.6.0-bin下载 hadoop-common

回答by Marco Seravalli

Instead of using the official branch I would suggest the windows optimized

我建议使用优化的 Windows,而不是使用官方分支

http://svn.apache.org/repos/asf/hadoop/common/branches/branch-trunk-win/

http://svn.apache.org/repos/asf/hadoop/common/branches/branch-trunk-win/

You need to compile it, build winutils.exe under windows and place it in the hadoop/bin directory

需要编译,在windows下构建winutils.exe,放到hadoop/bin目录下

回答by leifbennett

I ran into same problem with Hadoop 2.4.1 on Windows 8.1; there were a few differences with the resulting solution caused mostly by the newer OS.

我在 Windows 8.1 上使用 Hadoop 2.4.1 遇到了同样的问题;最终的解决方案存在一些差异,主要是由较新的操作系统引起的。

I first installed Hadoop 2.4.1 binary, unpacking it into %HADOOP_HOME%.

我首先安装了 Hadoop 2.4.1 二进制文件,将其解压到 % HADOOP_HOME% 中。

The previous answers describe how to set up Java, protobuf, cygwin, and maven, and the needed environment variables. I had to change my Platformenvironment variable from HP's odd 'BCD' value.

前面的回答描述了如何设置Java、protobuf、cygwin和maven,以及需要的环境变量。我不得不从 HP 的奇数“BCD”值更改我的平台环境变量。

I downloaded the source from an Apache mirror, and unpacked it in a short directory (HADOOP_SRC= C:\hsrc). Maven ran fine from a standard Windows command prompt in that directory: mvn package -DskipTests.

我从 Apache 镜像下载了源代码,并将其解压到一个短目录 ( HADOOP_SRC= C:\hsrc)。Maven 从该目录中的标准 Windows 命令提示符运行良好:mvn package -DskipTests

Instead of using the Windows 7 SDK (which I could not get to load) or the Windows 8.1 SDK (which doesn't have the command line build tools), I used the free Microsoft Visual Studio Express 2013 for Windows Desktop. Hadoop's build needed the MSBuild location (C:\Program Files (x86)\MSBuild\12.0) in the PATH, and required that the various Hadoop native source projects be upgraded to the newer (MS VS 2013) format. The maven build failures were nice enough to point out the absolute path of each project as it failed, making it easy to load the project into Visual Studio (which automatically converts, after asking).

我没有使用 Windows 7 SDK(我无法加载)或 Windows 8.1 SDK(没有命令行构建工具),而是使用了免费的Microsoft Visual Studio Express 2013 for Windows Desktop。Hadoop 的构建需要 PATH 中的 MSBuild 位置(C:\Program Files (x86)\MSBuild\12.0),并且需要将各种 Hadoop 原生源项目升级到更新的 (MS VS 2013) 格式。maven 构建失败足以指出每个失败的项目的绝对路径,从而可以轻松地将项目加载到 Visual Studio(在询问后会自动转换)。

Once built, I copied the native executables and libraries into the Hadoop bin directory. They were built in %HADOOP_SRC%\hadoop-common-project\hadoop-common\target\bin, and needed to be copied into %HADOOP_HOME%\bin.

构建完成后,我将本机可执行文件和库复制到 Hadoop bin 目录中。它们构建在 % HADOOP_SRC%\hadoop-common-project\hadoop-common\target\bin 中,需要复制到 % HADOOP_HOME%\bin 中。

回答by Kunal Kanojia

Adding hadoop.dll and hdfs.dll to the %HADOOP_HOME%\bin folder did the trick for me.

将 hadoop.dll 和 hdfs.dll 添加到 %HADOOP_HOME%\bin 文件夹对我有用。

回答by Derry

Just installed Hadoop 2.2.0 in my environment win7 X64.

刚刚在我的环境 win7 X64 中安装了 Hadoop 2.2.0。

following BUILD.txt makes me did that.Note that:the dir in the hdfs-site.xml and mapred-site.xml is starts with / like below

遵循 BUILD.txt 让我做到了。注意:hdfs-site.xml 和 mapred-site.xml 中的目录以 / 开头,如下所示

E.G

例如

  <property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop-2.2.0_1/dfs/name</value>
<description></description>
<final>true</final>

May help u!

可以帮助你!