C# 是否有相当于 Apache Hadoop 的 .NET?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/339344/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a .NET equivalent to Apache Hadoop?
提问by danswain
So, I've been looking at Hadoopwith keen interest, and to be honest I'm fascinated, things don't get much cooler.
所以,我一直对Hadoop产生浓厚的兴趣,老实说我很着迷,事情并没有变得更酷。
My only minor issue is I'm a C# developer and it's in Java.
我唯一的小问题是我是 C# 开发人员,而且是 Java 语言。
It's not that I don't understand the Java as much as I'm looking for the Hadoop.net or NHadoop or the .NET project that embraces the Google MapReduceapproach. Does anyone know of one?
并不是说我对 Java 的了解不如我正在寻找 Hadoop.net 或 NHadoop 或包含Google MapReduce方法的 .NET 项目。有人知道吗?
采纳答案by chews
Have you looked at using Hadoop's streaming?
您是否考虑过使用Hadoop 的流?
I use it in python all the time :-).
我一直在 python 中使用它:-)。
I'm starting to see that the heterogeneous approach is often the best and it looks like other folks are doing the same.
我开始看到异构方法通常是最好的,而且看起来其他人也在做同样的事情。
If you look at projects like protocol-buffers or facebook's thrift you see that sometimes it's just best to use an app written in another language and build the glue in the language of your preference.
如果您查看诸如协议缓冲区或 facebook 的 thrift 之类的项目,您会发现有时最好使用用另一种语言编写的应用程序并用您喜欢的语言构建粘合剂。
回答by chews
回答by chews
There's a pretty cute MapReduce implementation for .NET at: http://mapsharp.codeplex.com/
.NET 有一个非常可爱的 MapReduce 实现:http: //mapsharp.codeplex.com/
回答by foxxtrot
回答by Turbo
I would say that DryadLinq is the closest thing that us .NET folk have to Hadoop. But it depends what you want to use hadoop for. If you are looking for the optimized self maintaining distributed file (DFS) system then DryadLINQisn't what you are looking for. It has an analog to the DFS but you have to manually build the partitions and distribute each partition.
我会说 DryadLinq 是我们 .NET 人员最接近 Hadoop 的东西。但这取决于您想将 hadoop 用于什么目的。如果您正在寻找优化的自我维护分布式文件 (DFS) 系统,那么DryadLINQ不是您想要的。它类似于 DFS,但您必须手动构建分区并分发每个分区。
That being said, if its the distributed execution aspect of Hadoop that you are looking for than DryadLINQ is truly wonderful (and no, i'm not affiliated with MS). As long as you have a Microsoft HPCcluster setup than getting going with DryadLINQ is really easy.
话虽如此,如果您正在寻找的 Hadoop 分布式执行方面比 DryadLINQ 真的很棒(不,我不隶属于 MS)。只要您有一个Microsoft HPC集群设置,那么使用 DryadLINQ 就非常容易。
The code you write is really just straight LINQ code, except instead of executing the LINQ on IEnumerable<T>
you have to execute it on PartitionedTable<T>
(the self build distributed data structure).
您编写的代码实际上只是直接的 LINQ 代码,除非IEnumerable<T>
您必须在PartitionedTable<T>
(自构建分布式数据结构)上执行它,而不是在其上执行 LINQ 。
What has really been cool about DryadLINQ is the fast turn around time (try, test, adjust, repeat) when developing algorithms. You just write LINQ code to do your calculations and DryadLINQ will take care of the whole distributed execution part. It's the most natural analog I've come across that makes writing code for distributed processing just like writing code for single process processing.
DryadLINQ 真正酷的是开发算法时的快速周转时间(尝试、测试、调整、重复)。您只需编写 LINQ 代码来进行计算,DryadLINQ 将负责整个分布式执行部分。这是我遇到的最自然的模拟,它使得为分布式处理编写代码就像为单进程处理编写代码一样。
回答by Dileep stanley
It may be better to use Apache Hadoop and streaming because Apache Hadoop is actively being developed and maintained by big giants in the Industry like Yahoo and Facebook. So it can do what you expect it to do.
使用 Apache Hadoop 和流式传输可能会更好,因为 Apache Hadoop 正在由 Yahoo 和 Facebook 等行业巨头积极开发和维护。所以它可以做你期望它做的事情。
If you need a solution in .NET please check Myspace implementation @ MySpace Qizmt - MySpace's Open Source Mapreduce Framework
如果您需要 .NET 中的解决方案,请检查 Myspace implementation @ MySpace Qizmt - MySpace's Open Source Mapreduce Framework
回答by John
dryad/linq is being productized and will be released soon: http://blogs.technet.com/b/windowshpc/archive/2011/07/07/announcing-linq-to-hpc-beta-2.aspxuse in conjunction with Microsoft HPC for a powerful, cluster based solution for quering unstructured data
dryad/linq 正在产品化,即将发布:http: //blogs.technet.com/b/windowshpc/archive/2011/07/07/announcing-linq-to-hpc-beta-2.aspx结合使用使用 Microsoft HPC 提供强大的、基于集群的解决方案,用于查询非结构化数据
回答by NicoJuicy
I answered your question in my question here
我在这里的问题中回答了你的问题
To say it here in the source:
在源代码中说:
Microsoft droppedits alternative (Dryad) in favor of Hadoop. Next year they will release MS SQL Server 2012 with Hadoop integration. Azure and Windows Sever support is being developed even as we speak.
微软放弃了它的替代品(Dryad),转而支持 Hadoop。明年他们将发布与 Hadoop 集成的 MS SQL Server 2012。就在我们说话的时候,Azure 和 Windows Sever 支持正在开发中。
It will be available in the first half in 2012.
它将于 2012 年上半年上市。
Hadoop is the #1 BigDataplatform and is going to be supported by opensource and proprietary source (Java, .Net, Python, ...) even Oracle is adopting it.
Hadoop 是排名第一的大数据平台,它将得到开源和专有源(Java、.Net、Python 等)的支持,甚至 Oracle也在采用它。
If you were developing something, you should wait if you're on the .Net platform.
如果你正在开发一些东西,如果你在 .Net 平台上,你应该等待。
More information about what is possible will be available here
有关可能的更多信息将在此处提供
回答by benjguin
Microsoft Research has project Daytona http://research.microsoft.com/en-us/projects/daytona/
微软研究院有项目 Daytona http://research.microsoft.com/en-us/projects/daytona/
You can download it. There's a WordCount sample in C#.
你可以下载它。C# 中有一个 WordCount 示例。
回答by Ovais
You can look into something like RavenDb it provides very decent support for MapReduce for a fairly large size of data. as it is built in .Net so a proper LINQ client API is available.
你可以看看像 RavenDb 这样的东西,它为相当大的数据提供了对 MapReduce 的非常好的支持。因为它是在 .Net 中构建的,因此可以使用适当的 LINQ 客户端 API。
To get you started you can read my blogentery.
为了让你开始,你可以阅读我的博客输入。