Windows 服务器上的 Hadoop

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/467911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 11:52:29  来源:igfitidea点击:

Hadoop on windows server

c#windowshadoopmapreducecluster-computing

提问by Luca Martinetti

I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM)

我正在考虑使用 hadoop 在我现有的 Windows 2003 服务器上处理大型文本文件(大约 10 台具有 16GB RAM 的四核机器)

The questions are:

问题是:

  1. Is there any good tutorial on how to configure an hadoop cluster on windows?

  2. What are the requirements? java + cygwin + sshd ? Anything else?

  3. HDFS, does it play nice on windows?

  4. I'd like to use hadoop in streaming mode. Any advice, tool or trick to develop my own mapper / reducers in c#?

  5. What do you use for submitting and monitoring the jobs?

  1. 有没有关于如何在 Windows 上配置 hadoop 集群的好教程?

  2. 有什么要求?java + cygwin + sshd ?还要别的吗?

  3. HDFS,它在 Windows 上玩得好吗?

  4. 我想在流模式下使用 hadoop。在 C# 中开发我自己的映射器/减速器的任何建议、工具或技巧?

  5. 你用什么来提交和监控作业?

Thanks

谢谢

回答by bradheintz

From the Hadoop documentation:

Hadoop 文档

Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

支持 Win32 作为开发平台。分布式操作在Win32上没有经过很好的测试,所以不支持作为生产平台

Which I think translates to: "You're on your own."

我认为这可以转化为:“你靠你自己。”

That said, there might be hope if you're not queasy about installing Cygwin and a Java shim, according to the Getting Started page of the Hadoop wiki:

也就是说,根据Hadoop wiki 的入门页面,如果您对安装 Cygwin 和 Java shim 不感到不安,那么可能会有希望:

It is also possible to run the Hadoop daemons as Windows Services using the Java Service Wrapper (download this separately). This still requires Cygwin to be installed as Hadoop requires its df command.

也可以使用 Java Service Wrapper(单独下载)将 Hadoop 守护进程作为 Windows 服务运行。这仍然需要安装 Cygwin,因为 Hadoop 需要它的 df 命令。

I guess the bottom line is that it doesn't sound impossible, but you'd be swimming upstream all the way. I've done a few Hadoop installs (on Linux for production, Mac for dev) now, and I wouldn't bother with Windows when it's so straightforward on other platforms.

我想底线是这听起来不是不可能,但你会一直向上游游泳。我现在已经完成了一些 Hadoop 安装(在 Linux 上用于生产,Mac 上用于开发),当它在其他平台上如此简单时,我不会打扰 Windows。

回答by Ilya Haykinson

While not the answer you may want to hear, I would highly recommend repurposing the machines as, say, Linux servers, and running Hadoop there. You will benefit from tutorials and experience and testing performed on that platform, and spend your time solving business problems rather than operational issues.

虽然不是您可能想听到的答案,但我强烈建议将这些机器重新用于 Linux 服务器,并在那里运行 Hadoop。您将从在该平台上执行的教程、经验和测试中受益,并花时间解决业务问题而不是运营问题。

However, you can still write your jobs in C#. Since Hadoop supports the "streaming" implementation, you can write your jobs in any language. With the Mono framework, you should be able to take pretty much any .NET code written on the Windows platform and just run the same binary on Linux.

但是,您仍然可以使用 C# 编写作业。由于 Hadoop 支持“流式”实现,因此您可以使用任何语言编写作业。使用 Mono 框架,您应该能够使用几乎所有在 Windows 平台上编写的 .NET 代码,并在 Linux 上运行相同的二进制文件。

You can also access HDFS from Windows fairly easily -- while I don't recommend running the Hadoop services on Windows, you can certainly run the DFS client from the Windows platform to copy files in and out of the distributed file system.

您还可以相当轻松地从 Windows 访问 HDFS —— 虽然我不建议在 Windows 上运行 Hadoop 服务,但您当然可以从 Windows 平台运行 DFS 客户端来将文件复制进和复制出分布式文件系统。

For submitting and monitoring jobs, I think that you're mainly on your own... I don't think that there are any good general-purpose systems developed for Hadoop job management yet.

对于提交和监控作业,我认为您主要靠自己……我认为还没有为 Hadoop 作业管理开发任何好的通用系统。

回答by Jonathan Lin

If you're looking for map/reduce, you can try looking at MySpace's new map/reduce framework that runs on windows http://qizmt.myspace.com/

如果您正在寻找 map/reduce,您可以尝试查看在 Windows 上运行的 MySpace 的新 map/reduce 框架http://qizmt.myspace.com/