Scala 中的 MapReduce 实现

Question

提问by Roman Kagan

I'd like to find out good and robust MapReduce framework, to be utilized from Scala.

我想从 Scala 中找出优秀而健壮的 MapReduce 框架。

Answer 1

回答by Jorge Ortiz

To add to the answer on Hadoop: there are at least two Scala wrappers that make working with Hadoop more palatable.

补充一下关于 Hadoop 的答案：至少有两个 Scala 包装器可以让使用 Hadoop 变得更加可口。

Scala Map Reduce (SMR): http://scala-blogs.org/2008/09/scalable-language-and-scalable.html

Scala Map Reduce (SMR)：http: //scala-blogs.org/2008/09/scalable-language-and-scalable.html

SHadoop: http://jonhnny-weslley.blogspot.com/2008/05/shadoop.html

SHadoop：http://jonhnny-weslley.blogspot.com/2008/05/shadoop.html

UPD 5 oct. 11

UPD 10 月 5 日 11

There is also Scoobiframework, that has awesome expressiveness.

还有Scoobi框架，它具有令人敬畏的表现力。

Answer 2

回答by bayer

http://hadoop.apache.org/is language agnostic.

http://hadoop.apache.org/与语言无关。

Answer 3

回答by MattM

Personally, I've become a big fan of Spark

就个人而言，我已经成为 Spark 的忠实粉丝

http://spark-project.org/

You have the ability to do in-memory cluster computing, significantly reducing the overhead you would experience from disk-intensive mapreduce operations.

您可以进行内存集群计算，从而显着降低磁盘密集型 mapreduce 操作带来的开销。

Answer 4

回答by Xela

For a scala API on top of hadoop check out Scoobi, it is still in heavy development but shows a lot of promise. There is also some effort to implement distributed collections on top of hadoop in the Scala incubator, but that effort is not usable yet.

对于基于 hadoop 的 scala API，请查看Scoobi，它仍在大量开发中，但显示出很多希望。还有一些努力在Scala 孵化器中的 hadoop 之上实现分布式集合，但这种努力尚不可用。

There is also a new scala wrapper for cascading from Twitter, called Scalding. After looking very briefly over the documentation for Scalding it seems that while it makes the integration with cascading smoother it still does not solve what I see as the main problem with cascading: type safety. Every operation in cascading operates on cascading's tuples (basically a list of field values with or without a separate schema), which means that type errors, I.e. Joining a key as a String and key as a Long leads to run-time failures.

还有一个用于从 Twitter 级联的新 Scala 包装器，称为Scalding。在非常简要地查看 Scalding 的文档后，似乎虽然它使与级联的集成更加顺畅，但仍然没有解决我认为级联的主要问题：类型安全。级联中的每个操作都对级联的元组（基本上是带有或不带有单独模式的字段值列表）进行操作，这意味着类型错误，即将键作为 String 和键作为 Long 连接会导致运行时失败。

Answer 5

回答by AWhitford

You may be interested in scouchdb, a Scala interface to using CouchDB.

您可能对scouchdb感兴趣，这是一个使用CouchDB的 Scala 接口。

Another idea is to use GridGain. ScalaDudeshave an example of using GridGain with Scala. And hereis another example.

另一个想法是使用GridGain。 ScalaDudes有一个在 Scala 中使用 GridGain 的示例。而这里是另一个例子。

Answer 6

回答by bsdfish

A while back, I ran into exactly this problem and ended up writing a little infrastructure to make it easy to use Hadoop from Scala. I used it on my own for a while, but I finally got around to putting it on the web. It's named (very originally) ScalaHadoop.

不久前，我正好遇到了这个问题，并最终编写了一个小基础设施，以便从 Scala 轻松使用 Hadoop。我自己用了一段时间，但我终于把它放在网上。它被命名为（非常原始）ScalaHadoop。

Answer 7

回答by seanc

to further jshen's point:

进一步说明 jshen 的观点：

hadoop streaming simply uses sockets. using unix streams, your code (any language) simply has to be able to read from stdin and output tab delimited streams. implement a mapper and if needed, a reducer (and if relevant, configure that as the combiner).

hadoop 流只是使用套接字。使用 unix 流，您的代码（任何语言）只需能够从标准输入和输出制表符分隔的流中读取。实现一个映射器，如果需要，一个化简器（如果相关，将其配置为组合器）。

Scala 中的 MapReduce 实现

提问by Roman Kagan

回答by Jorge Ortiz

回答by bayer

回答by MattM

回答by Xela

回答by AWhitford

回答by bsdfish

回答by seanc

相关推荐

最近更新

标签

Scala 中的 MapReduce 实现

提问by Roman Kagan

回答by Jorge Ortiz

回答by bayer

回答by MattM

回答by Xela

回答by AWhitford

回答by bsdfish

回答by seanc

相关推荐

twitter-bootstrap Glyphicon 在引导程序 4.1 及更高版本中不起作用

twitter-bootstrap Bootstrap 4 模态中心内容

twitter-bootstrap 为什么 h-100 不起作用？

twitter-bootstrap Bootstrap 4 导航栏活动类

相关推荐

最近更新

标签