C# CUDA 驱动程序 API 与 CUDA 运行时
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/242894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
CUDA Driver API vs. CUDA runtime
提问by Morten Christiansen
When writing CUDA applications, you can either work at the driver level or at the runtime level as illustrated on this image (The libraries are CUFFT and CUBLAS for advanced math):
编写 CUDA 应用程序时,您可以在驱动程序级别或运行时级别工作,如下图所示(库是用于高级数学的 CUFFT 和 CUBLAS):
(source: tomshw.it)
(来源:tomshw.it)
I assume the tradeoff between the two are increased performance for the low-evel API but at the cost of increased complexity of code. What are the concrete differences and are there any significant things which you cannot do with the high-level API?
我认为两者之间的权衡是提高低级 API 的性能,但代价是代码复杂性增加。具体的区别是什么?有什么重要的事情是你不能用高级 API 做的?
I am using CUDA.net for interop with C# and it is built as a copy of the driver API. This encourages writing a lot of rather complex code in C# while the C++ equivalent would be more simple using the runtime API. Is there anything to win by doing it this way? The one benefit I can see is that it is easier to integrate intelligent error handling with the rest of the C# code.
我使用 CUDA.net 与 C# 进行互操作,它是作为驱动程序 API 的副本构建的。这鼓励在 C# 中编写大量相当复杂的代码,而使用运行时 API 则 C++ 等效代码会更简单。这样做有什么可以赢得的吗?我能看到的一个好处是,将智能错误处理与 C# 代码的其余部分集成起来更容易。
采纳答案by mch
The CUDA runtime makes it possible to compile and link your CUDA kernels into executables. This means that you don't have to distribute cubin files with your application, or deal with loading them through the driver API. As you have noted, it is generally easier to use.
CUDA 运行时可以编译您的 CUDA 内核并将其链接到可执行文件中。这意味着您不必随应用程序分发 cubin 文件,也不必通过驱动程序 API 处理加载它们。正如您所指出的,它通常更易于使用。
In contrast, the driver API is harder to program but provided more control over how CUDA is used. The programmer has to directly deal with initialization, module loading, etc.
相比之下,驱动程序 API 更难编程,但可以更好地控制 CUDA 的使用方式。程序员必须直接处理初始化、模块加载等。
Apparently more detailed device information can be queried through the driver API than through the runtime API. For instance, the free memory available on the device can be queried only through the driver API.
显然,通过驱动程序 API 比通过运行时 API 可以查询更详细的设备信息。例如,设备上可用的空闲内存只能通过驱动程序 API 进行查询。
From the CUDA Programmer's Guide:
来自 CUDA 程序员指南:
It is composed of two APIs:
- A low-level API called the CUDA driver API,
- A higher-level API called the CUDA runtime API that is implemented on top of the CUDA driver API.
These APIs are mutually exclusive: An application should use either one or the other.
The CUDA runtime eases device code management by providing implicit initialization, context management, and module management. The C host code generated by nvcc is based on the CUDA runtime (see Section 4.2.5), so applications that link to this code must use the CUDA runtime API.
In contrast, the CUDA driver API requires more code, is harder to program and debug, but offers a better level of control and is language-independent since it only deals with cubin objects (see Section 4.2.5). In particular, it is more difficult to configure and launch kernels using the CUDA driver API, since the execution configuration and kernel parameters must be specified with explicit function calls instead of the execution configuration syntax described in Section 4.2.3. Also, device emulation (see Section 4.5.2.9) does not work with the CUDA driver API.
它由两个 API 组成:
- 称为 CUDA 驱动程序 API 的低级 API,
- 一个更高级别的 API,称为 CUDA 运行时 API,它在 CUDA 驱动程序 API 之上实现。
这些 API 是相互排斥的:应用程序应该使用其中一个。
CUDA 运行时通过提供隐式初始化、上下文管理和模块管理来简化设备代码管理。nvcc 生成的 C 主机代码基于 CUDA 运行时(参见第 4.2.5 节),因此链接到此代码的应用程序必须使用 CUDA 运行时 API。
相比之下,CUDA 驱动程序 API 需要更多代码,更难编程和调试,但提供了更好的控制级别并且与语言无关,因为它只处理 cubin 对象(参见第 4.2.5 节)。特别是,使用 CUDA 驱动程序 API 配置和启动内核更加困难,因为必须使用显式函数调用而不是第 4.2.3 节中描述的执行配置语法来指定执行配置和内核参数。此外,设备仿真(参见第 4.5.2.9 节)不适用于 CUDA 驱动程序 API。
There is no noticeable performance difference between the API's. How your kernels use memory and how they are laid out on the GPU (in warps and blocks) will have a much more pronounced effect.
API 之间没有明显的性能差异。您的内核如何使用内存以及它们在 GPU 上的布局(在扭曲和块中)将产生更明显的影响。
回答by mch
a couple of important things to note:
需要注意的几个重要事项:
first the differences between the APIs only apply to the host side code. The kernels are exactly the same. on the host side the complexity of the driver api is pretty trivial, the fundamental differences are:
首先,API 之间的差异仅适用于主机端代码。内核完全一样。在主机端,驱动程序 api 的复杂性非常简单,基本区别是:
in driver api you have access to functionality that is not available in the runtime api like contexts.
在驱动程序 api 中,您可以访问在运行时 api 中不可用的功能,如上下文。
the emulator only works with code written for the runtime api.
模拟器仅适用于为运行时 API 编写的代码。
oh and currently cudpp which is a very handy library only works with the runtime api.
哦,目前 cudpp 是一个非常方便的库,仅适用于运行时 API。
回答by mch
There are some real issues with argument alignment and the driver API. Check out the CUDA 2.2 beta (or later) documentation for more information.
参数对齐和驱动程序 API 存在一些实际问题。查看 CUDA 2.2 beta(或更高版本)文档以获取更多信息。
回答by Jason Dale
I have found that for deployment of libraries in multi-threaded applications, the control over CUDA context provided by the driver API was critical. Most of my clients want to integrate GPU acceleration into existing applications, and these days, almost all applications are multi-threaded. Since I could not guarantee that all GPU code would be initialized, executed and deallocated from the same thread, I had to use the driver API.
我发现对于在多线程应用程序中部署库,驱动程序 API 提供的对 CUDA 上下文的控制至关重要。我的大多数客户都希望将 GPU 加速集成到现有应用程序中,而如今,几乎所有应用程序都是多线程的。由于我不能保证所有 GPU 代码都从同一个线程初始化、执行和释放,我不得不使用驱动程序 API。
My initial attempts with various work-arounds in the runtime API all led to failure, sometimes in spectacular fashion - I found I could repeatedly, instantly reboot a machine by performing just the wrong set of CUDA calls from different threads.
我最初尝试在运行时 API 中使用各种变通方法都失败了,有时以惊人的方式失败 - 我发现我可以通过从不同线程执行错误的 CUDA 调用来重复、立即重新启动机器。
Since we migrated everything over the Driver API, all has been well.
由于我们通过 Driver API 迁移了所有内容,因此一切顺利。
J
J