Python 计算集群
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1602177/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python compute cluster
提问by Joe
Would it be possible to make a python cluster, by writing a telnet server, then telnet-ing the commands and output back-and-forth? Has anyone got a better idea for a python compute cluster? PS. Preferably for python 3.x, if anyone knows how.
是否可以通过编写 telnet 服务器,然后 telnet 命令并来回输出来创建 python 集群?有没有人对 python 计算集群有更好的想法?附注。最好是python 3.x,如果有人知道的话。
回答by Andrey Vlasovskikh
The Python wiki hosts a very comprehensive list of Python cluster computing libraries and tools. You might be especially interested in Parallel Python.
Python wiki 拥有非常全面的Python 集群计算库和工具列表。您可能对Parallel Python尤其感兴趣。
Edit:There is a new library that is IMHO especially good at clustering: execnet. It is small and simple. And it appears to have less bugs than, say, the standard multiprocessing
module.
编辑:有一个新的库,恕我直言,特别擅长集群:execnet。它小而简单。并且它似乎比标准multiprocessing
模块的错误更少。
回答by Alex Martelli
You can see most of the third-party packages available for Python 3 listed here; relevant to cluster computation is mpi4py-- most other distributed computing tools such as pyro are still Python-2 only, but MPI is a leading standard for cluster distributed computation and well looking into (I have no direct experience using mpi4py with Python 3, yet, but by hearsay I believe it's a good implementation).
您可以在此处看到大多数可用于 Python 3 的第三方软件包;与集群计算相关的是mpi4py——大多数其他分布式计算工具,如pyro仍然只是Python-2,但MPI是集群分布式计算的领先标准,并且很好看(我没有在Python 3中使用mpi4py的直接经验,但是,但据传闻我相信这是一个很好的实现)。
The main alternative is Python's own built-in multiprocessing, which also scales up pretty well if you have no interest in interfacing existing nodes that respect the MPI standards but may not be coded in Python.
主要的替代方法是 Python 自己的内置multiprocessing,如果您对连接遵守 MPI 标准但可能未用 Python 编码的现有节点不感兴趣,它也可以很好地扩展。
There is no real added value in rolling your own (as Atwood says, don't reinvent the wheel, unless your purpose is just to better understand wheels!-) -- use one of the solid, tested, widespread solutions, already tested, debugged and optimized on your behalf!-)
自己动手没有真正的附加值(正如阿特伍德所说,不要重新发明轮子,除非你的目的只是为了更好地理解轮子!-)——使用一种可靠的、经过测试的、广泛的解决方案,已经过测试,代表您调试和优化!-)
回答by Anurag Uniyal
Look into these
看看这些
http://www.parallelpython.com/
http://www.parallelpython.com/
I have used both and both are exellent for distributed computing
for more detailed list of options see
http://wiki.python.org/moin/ParallelProcessing
我已经使用了两者,并且两者都非常适合分布式计算,
有关更详细的选项列表,请参见
http://wiki.python.org/moin/ParallelProcessing
and if you want to auto execute something on remote machine , better alternative to telnet is ssh as in http://pydsh.sourceforge.net/
如果你想在远程机器上自动执行一些东西,更好的 telnet 替代方法是 ssh,如http://pydsh.sourceforge.net/
回答by nstehr
What kind of stuff do you want to do? You might want to check out hadoop. The backend, heavy lifting is done in java, but has a python interface, so you can write python scripts create and send the input, as well as process the results.
你想做什么样的事情?您可能想查看hadoop。后端,繁重的工作是用 java 完成的,但有一个 python 接口,所以你可以编写 python 脚本来创建和发送输入,以及处理结果。
回答by Yan Hu
If you need to write administrative scripts, take a look at the ClusterShellPython library too, or/and its parallel shell clush. It's useful when dealing with node sets also (man nodeset).
如果您需要编写管理脚本,也可以查看ClusterShellPython 库,或/和它的并行 shell clush。它在处理节点集时也很有用(man nodeset)。
回答by user2913120
I think IPython.parallelis the way to go. I've been using it extensively for the last year and a half. It allows you to work interactively with as many worker nodes as you want. If you are on AWS, StarClusteris a great way to get IPython.parallel up and running quickly and easily with as many EC2 nodes as you can afford. (It can also automatically install Hadoop, and a variety of other useful tools, if needed.) There are some tricks to using it. (For example, you don't want to send large amounts of data through the IPython.parallel interface itself. Better to distribute a script that will pull down chunks of data on each engine individually.) But overall, I've found it to be a remarkably easy way to do distributed processing (WAYbetter than Hadoop!)
我认为IPython.parallel是要走的路。在过去的一年半里,我一直在广泛使用它。它允许您根据需要与任意数量的工作节点交互工作。如果您在 AWS 上,StarCluster是使用尽可能多的 EC2 节点快速轻松地启动和运行IPython.parallel的好方法。(如果需要,它还可以自动安装 Hadoop 和各种其他有用的工具。)使用它有一些技巧。(例如,您不想通过 IPython.parallel 接口本身发送大量数据。最好分发一个脚本,该脚本将分别拉下每个引擎上的数据块。)但总的来说,我发现它可以是一个非常简单的方法来做到分布式处理(WAY比Hadoop的!)
回答by Lennart Regebro
"Would it be possible to make a python cluster"
“有没有可能制作一个python集群”
Yes.
是的。
I love yes/no questions. Anything else you want to know?
我喜欢是/否问题。你还有什么想知道的吗?
(Note that Python 3 has few third-party libraries yet, so you may wanna stay with Python 2 at the moment.)
(请注意,Python 3 还几乎没有第三方库,因此您现在可能想继续使用 Python 2。)