windows 如何让 R 使用所有处理器?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1395309/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 06:41:56  来源:igfitidea点击:

How to make R use all processors?

linuxrwindowsparallel-processingpackages

提问by David Smith

I have a quad-core laptop running Windows XP, but looking at Task Manager R only ever seems to use one processor at a time. How can I make R use all four processors and speed up my R programs?

我有一台运行 Windows XP 的四核笔记本电脑,但查看任务管理器 R 似乎一次只使用一个处理器。如何让 R 使用所有四个处理器并加速我的 R 程序?

回答by hangmanwa7id

I have a basic system I use where I parallelize my programs on the "for" loops. This method is simple once you understand what needs to be done. It only works for local computing, but that seems to be what you're after.

我有一个基本系统,用于在“for”循环中并行化我的程序。一旦您了解需要做什么,此方法就很简单。它仅适用于本地计算,但这似乎是您所追求的。

You'll need these libraries installed:

你需要安装这些库:

library("parallel")
library("foreach")
library("doParallel")

First you need to create your computing cluster. I usually do other stuff while running parallel programs, so I like to leave one open. The "detectCores" function will return the number of cores in your computer.

首先,您需要创建计算集群。我通常在运行并行程序时做其他事情,所以我喜欢打开一个。“detectCores”函数将返回您计算机中的内核数。

cl <- makeCluster(detectCores() - 1)
registerDoParallel(cl, cores = detectCores() - 1)

Next, call your for loop with the "foreach" command, along with the %dopar% operator. I always use a "try" wrapper to make sure that any iterations where the operations fail are discarded, and don't disrupt the otherwise good data. You will need to specify the ".combine" parameter, and pass any necessary packages into the loop. Note that "i" is defined with an equals sign, not an "in" operator!

接下来,使用“foreach”命令以及 %dopar% 运算符调用您的 for 循环。我总是使用“尝试”包装器来确保操作失败的任何迭代都被丢弃,并且不会破坏其他好的数据。您需要指定“.combine”参数,并将任何必要的包传递到循环中。请注意,“i”是用等号定义的,而不是“in”运算符!

data = foreach(i = 1:length(filenames), .packages = c("ncdf","chron","stats"),
               .combine = rbind) %dopar% {
  try({
       # your operations; line 1...
       # your operations; line 2...
       # your output
     })
}

Once you're done, clean up with:

完成后,清理:

stopCluster(cl)

回答by Dirk Eddelbuettel

The CRAN Task View on High-Performance Compting with Rlists several options. XP is a restriction, but you still get something like snowto work using sockets within minutes.

使用 R 进行高性能计算CRAN 任务视图列出了几个选项。XP 是一个限制,但您仍然可以在几分钟内使用套接字来工作,例如下雪

回答by csgillespie

As of version 2.15, R now comes with native support for multi-core computations. Just load the parallel package

从 2.15 版开始,R 现在自带对多核计算的原生支持。只需加载并行包

library("parallel")

and check out the associated vignette

并查看相关的小插图

vignette("parallel")

回答by JD Long

I hear tell that REvolution Rsupports better multi-threading then the typical CRAN version of R and REvolution also supports 64 bit R in windows. I have been considering buying a copy but I found their pricing opaque. There's no price list on their web site. Very odd.

我听说REvolution R支持更好的多线程,然后 R 的典型 CRAN 版本和 REvolution 在 Windows 中也支持 64 位 R。我一直在考虑购买副本,但我发现他们的定价不透明。他们的网站上没有价目表。很奇怪。

回答by Peter M

I believe the multicorepackage works on XP. It gives some basic multi-process capability, especially through offering a drop-in replacement for lapply()and a simple way to evaluate an expression in a new thread (mcparallel()).

我相信该multicore软件包适用于 XP。它提供了一些基本的多进程能力,特别是通过提供lapply()一个简单的方法来替代新线程 ( mcparallel()) 中的表达式。

回答by ephpostfacto

On Windows I believe the best way to do this would probably be with foreach and snow as David Smith said.

在 Windows 上,我相信最好的方法可能是像大卫史密斯所说的那样使用 foreach 和 snow。

However, Unix/Linux based systems can compute using multiple processes with the 'multicore' package. It provides a high-level function, 'mclapply', that performs a list comprehension across multiple cores. An advantage of the 'multicore' package is that each processor gets a private copy of the Global Environment that it may modify. Initially, this copy is just a pointer to the Global Environment, making the sharing of variable extremely quick if the Global Environment is treated as read-only.

但是,基于 Unix/Linux 的系统可以通过“多核”包使用多个进程进行计算。它提供了一个高级函数“mclapply”,可以跨多个内核执行列表理解。“多核”包的一个优点是每个处理器都可以获得它可以修改的全局环境的私有副本。最初,这个副本只是一个指向全局环境的指针,如果全局环境被视为只读,则可以非常快速地共享变量。

Rmpi requires that the data be explicitly transferred between R processes instead of working with the 'multicore' closure approach.

Rmpi 要求在 R 进程之间显式传输数据,而不是使用“多核”闭包方法。

-- Dan

——丹

回答by Tom Wenseleers

If you do a lot of matrix operations and you are using Windows you can install revolutionanalytics.com/revolution-r-openfor free, and this one comes with the intel MKL libraries which allow you to do multithreaded matrix operations. On Windows if you take the libiomp5md.dll, Rblas.dll and Rlapack.dll files from that install and overwrite the ones in whatever R version you like to use you'll have multithreaded matrix operations (typically you get a 10-20 x speedup for matrix operations). Or you can use the Atlas Rblas.dll from prs.ism.ac.jp/~nakama/SurviveGotoBLAS2/binary/windows/x64which also work on 64 bit R and are almost as fast as the MKL ones. I found this the single easiest thing to do to drastically increase R's performance on Windows systems. Not sure why they don't come as standard in fact on R Windows installs.

如果您进行大量矩阵运算并且使用的是 Windows,则可以免费安装revolutionanalytics.com/revolution-r-open,这个带有intel MKL 库,可让您执行多线程矩阵运算。在 Windows 上,如果您从该安装中获取 libiomp5md.dll、Rblas.dll 和 Rlapack.dll 文件并覆盖您喜欢使用的任何 R 版本中的文件,您将拥有多线程矩阵操作(通常您会获得 10-20 倍的加速)矩阵运算)。或者您可以使用来自prs.ism.ac.jp/~nakama/SurviveGotoBLAS2/binary/windows/x64的 Atlas Rblas.dll它也适用于 64 位 R 并且几乎与 MKL 一样快。我发现这是显着提高 R 在 Windows 系统上的性能的最简单的方法。不知道为什么它们实际上在 R Windows 安装中没有成为标准。

On Windows, multithreading unfortunately is not well supported in R (unless you use OpenMP via Rcpp) and the available SOCKET-based parallelization on Windows systems, e.g. via package parallel, is very inefficient. On POSIX systems things are better as you can use forking there.(package multicorethere is I believe the most efficient one). You could also try to use package Rdsmfor multithreading within a shared memory model - I've got a version on my github that has unflagged -unix only flag and should work also on Windows (earlier Windows wasn't supported as dependency bigmemorysupposedly didn't work on Windows, but now it seems it does) :

在 Windows 上,不幸的是,多线程在 R 中并没有得到很好的支持(除非您通过 Rcpp使用OpenMP)并且Windows 系统上可用的基于 SOCKET 的并行化,例如通过包并行,效率非常低。在 POSIX 系统上情况会更好,因为您可以在那里使用分叉。(包multicore里面是我认为效率最高的一个)。您还可以尝试Rdsm在共享内存模型中使用包进行多线程处理 - 我在我的 github 上有一个版本,它没有标记 -unix only 标志并且应该也可以在 Windows 上工作(早期的 Windows 不受支持,因为依赖关系bigmemory应该没有在 Windows 上工作,但现在看来确实如此):

library(devtools)
devtools::install_github('tomwenseleers/Rdsm')
library(Rdsm)