在 Windows 7 机器上有没有很好的 Hadoop 开发在线教程?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7278423/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 17:59:01  来源:igfitidea点击:

Is there a good online tutorial for Hadoop development on a Windows 7 machine?

windowswindows-7hadoop

提问by Steph

I've been following the awesome Yahoo! Hadoop tutorial, which worked great for getting a virtual machine environment set up (Module 3 of the tutorial). But now I'm getting stumped by the HDFS section (Module 2) and think it might be easier if I had a Windows specific tutorial. I tried following this one, but some of the steps weren't quite right. I've been trying to find a good tutorial that will work for me on my Windows 7 machine, but am a bit stuck. Is there a good place to go for this? Hadoop seems to be very geared toward Linux users, and unfortunately I have to use my work laptop, which is Windows 7. Can I make this work or does it really only work for Linux users?

我一直在关注很棒的雅虎!Hadoop 教程,它非常适合设置虚拟机环境(教程的模块 3)。但是现在我被 HDFS 部分(模块 2)难住了,并认为如果我有一个 Windows 特定的教程可能会更容易。我尝试按照步骤操作,但有些步骤不太正确。我一直试图找到一个很好的教程,它可以在我的 Windows 7 机器上对我来说有效,但有点卡住了。有什么好去处吗?Hadoop 似乎非常适合 Linux 用户,但不幸的是,我必须使用我的工作笔记本电脑,即 Windows 7。我可以做到这一点吗?它真的只适用于 Linux 用户吗?

采纳答案by Allen

The Hadoop tutorial on the Yahoo Developer Network is outdated and problematic. Half of the steps didn't work for me at all (I was running their image in VMware Player on Windows 7), and the other half were vague. The Java code examples were poorly written and wouldn't compile. At any rate, they are written for the old Hadoop API.

雅虎开发者网络上的 Hadoop 教程已经过时且存在问题。一半的步骤对我来说根本不起作用(我在 Windows 7 上的 VMware Player 中运行他们的图像),另一半是模糊的。Java 代码示例编写得不好,无法编译。无论如何,它们是为旧的 Hadoop API 编写的。

I gave up on that tutorial and instead used the Cloudera Demo VM image. This comes pre-configured with Hadoop, Pig, Hive, HBase, etc. I was in business at once and had no problems compiling and running Hadoop jobs and Pig scripts.

我放弃了该教程,而是使用了 Cloudera Demo VM 映像。它预先配置了 Hadoop、Pig、Hive、HBase 等。我马上开始工作,编译和运行 Hadoop 作业和 Pig 脚本没有问题。

The Cloudera Demo VM downloads on their main support page (https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM) are all 64-bit. If you are looking for a 32-bit version like I was, you can get one here: https://downloads.cloudera.com/cloudera-demo-0.3.7.vmwarevm.tar.bz2

Cloudera Demo VM 在其主要支持页面 ( https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM) 上的下载都是 64 位的。如果您正在寻找像我一样的 32 位版本,可以在此处获取:https: //downloads.cloudera.com/cloudera-demo-0.3.7.vmwarevm.tar.bz2

This one has a slightly older version of the Cloudera distro (CDH3u0) running on Ubuntu 10.10 with Gnome desktop. I installed Eclipse for compiling my Hadoop jobs, but didn't bother trying to install the Hadoop plugin, which I've heard is problematic. The first time around, I made the mistake of accidentally updating the Cloudera distro to CDH3u3 via the system's Update Manager and this messed up my Hadoop configuration. I didn't know how to reconfigure it properly, so I just started over from the original image.

这个版本有一个稍旧版本的 Cloudera 发行版 (CDH3u0),它在带有 Gnome 桌面的 Ubuntu 10.10 上运行。我安装了 Eclipse 来编译我的 Hadoop 作业,但没有费心尝试安装 Hadoop 插件,我听说这是有问题的。第一次,我错误地通过系统的更新管理器不小心将 Cloudera 发行版更新为 CDH3u3,这弄乱了我的 Hadoop 配置。我不知道如何正确地重新配置它,所以我只是从原始图像重新开始。

To get Pig running, you need to first set the JAVA_HOME variable: export JAVA_HOME=/usr/lib/jvm/java-6-sun

要让 Pig 运行,您需要先设置 JAVA_HOME 变量: export JAVA_HOME=/usr/lib/jvm/java-6-sun

Unfortunately, I wasted a ton of time with that old YDN tutorial before a Java developer friend familiar with Hadoop pointed me to the Cloudera distribution.

不幸的是,在一位熟悉 Hadoop 的 Java 开发人员朋友向我介绍 Cloudera 发行版之前,我在旧的 YDN 教程上浪费了大量时间。

回答by Kevin Hom

I was completely new to hadoop and honestly I found the cloudera tutorials and information completely unhelpful. Give the IBM ones a shot, they're super helpful and they are very friendly for beginners. Step by step instructions for pretty much all of the core hadoop applications and a few specific to IBM's distro.

我对 hadoop 完全陌生,老实说,我发现 cloudera 教程和信息完全没有帮助。试一试IBM,他们非常乐于助人,而且对初学者非常友好。几乎所有核心​​ hadoop 应用程序和一些特定于 IBM 发行版的分步说明。

Here's the download link. --

这是下载链接。——

https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-ibmibqsevmw&S_TACT=109HF38W&S_CMP=109HF

https://www14.software.ibm.com/webapp/iwm/web/preLogin.do?source=swg-ibmibqsevmw&S_TACT=109HF38W&S_CMP=109HF

You have to make an account but it's free and doesn't take that long.

您必须创建一个帐户,但它是免费的,而且不需要那么长时间。

I can't post more than one link right now but is pretty easy to find the tutorials online and they also exist within the VM.

我现在不能发布多个链接,但很容易在网上找到教程,它们也存在于 VM 中。

Also there's a forum that I've posted my questions on when I get stuck and somebody from IBM has always helped me out within an hour to a day. Cant post the link but if you google "IBM InfoSphere BigInsights Forum", its the first hit.

还有一个论坛,当我遇到问题时,我会在上面发布我的问题,IBM 的某个人总是在一小时到一天内帮助我解决问题。无法发布链接,但如果您在 google 上搜索“IBM InfoSphere BigInsights 论坛”,就会发现它是第一个。

Good Luck!

祝你好运!

回答by vonmixer

I've been banging my head against the yahoo tutorial for a long time as well. The Eclipse plugin is no longer maintained and is pretty unreliable. Hopefully the cloudera image will do the the trick.

很长一段时间以来,我一直在反对雅虎教程。Eclipse 插件不再维护并且非常不可靠。希望 cloudera 图像能够解决问题。

回答by Lostsoul

I am trying to learn Hadoop right now also and what I did was download virtual box ( http://www.virtualbox.org/) and load some linux images on it and started following tutorials.

我现在也在尝试学习 Hadoop,我所做的是下载虚拟盒 ( http://www.virtualbox.org/) 并在其上加载一些 linux 图像并开始遵循教程。

You can even get a pre-made hadoop setup image from cloudera. I think this approach is far better than installing and setting up on your prime machine because in the event there's a problem you're main machine won't be effected(you can simply revert to an old copy of your virtual linux image or scrape it and start again without any impact).

您甚至可以从 cloudera 获得预制的 hadoop 设置映像。我认为这种方法比在您的主要机器上安装和设置要好得多,因为如果出现问题,您的主机不会受到影响(您可以简单地恢复到您的虚拟 linux 映像的旧副本或刮掉它并重新开始,没有任何影响)。

Good luck!

祝你好运!

回答by Niels Basjes

Developing Hadoop on windows is doable but hard to get right. It requires installing Cygwin and getting all the environment variables right can be tricky. To get started developing on windows I recommend installing vmware player and run the pre configured virtual machine by Cloudera. This simply means you will be doing the Hadoop development in linux without rebooting or reinstalling your windows system and without the installation troubles assiciated with cygwin.

在 Windows 上开发 Hadoop 是可行的,但很难做到正确。它需要安装 Cygwin 并且正确设置所有环境变量可能很棘手。要开始在 Windows 上开发,我建议安装 vmware player 并运行 Cloudera 预先配置的虚拟机。这只是意味着您将在 linux 中进行 Hadoop 开发,而无需重新启动或重新安装 Windows 系统,也没有与 cygwin 相关的安装麻烦。

https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM

https://ccp.cloudera.com/display/SUPPORT/Cloudera%27s+Hadoop+Demo+VM

回答by Yuan Zhang

I have just finished the "Hadoop Fundamentals I - Version 2 " at http://bigdatauniversity.com. It comes with IBM BigBisunessInsight VMWare images and works very well.

我刚刚完成了http://bigdatauniversity.com 上的“Hadoop Fundamentals I - Version 2” 。它带有 IBM BigBisunessInsight VMWare 映像并且运行良好。

The images include a local mode one and a cluster mode one. It is able to simulate a multiple nodes cluster in my Windows 8 workstation with 8GB RAM.

图像包括本地模式一和集群模式一。它能够在我的具有 8GB RAM 的 Windows 8 工作站中模拟多节点集群。

Hope this information be helpful:-)

希望这些信息有帮助:-)