visual-studio 如何以程序员愉快的方式使用 CUDA 常量内存?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4008031/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 12:39:16  来源:igfitidea点击:

How to use CUDA constant memory in a programmer pleasant way?

c++visual-studioheaderlinkercuda

提问by Yngve Sneen Lindal

I'm working on a number crunching app using the CUDA framework. I have some static data that should be accessible to all threads, so I've put it in constant memory like this:

我正在使用 CUDA 框架开发一个数字运算应用程序。我有一些所有线程都应该可以访问的静态数据,所以我把它放在常量内存中,如下所示:

__device__ __constant__ CaseParams deviceCaseParams;

I use the call cudaMemcpyToSymbol to transfer these params from the host to the device:

我使用调用 cudaMemcpyToSymbol 将这些参数从主机传输到设备:

void copyMetaData(CaseParams* caseParams)
{
    cudaMemcpyToSymbol("deviceCaseParams", caseParams, sizeof(CaseParams));
}

which works.

哪个有效。

Anyways, it seems (by trial and error, and also from reading posts on the net) that for some sick reason, the declaration of deviceCaseParams and the copy operation of it (the call to cudaMemcpyToSymbol) must be in the same file. At the moment I have these two in a .cu file, but I really want to have the parameter struct in a .cuh file so that any implementation could see it if it wants to. That means that I also have to have the copyMetaData function in the a header file, but this messes up linking (symbol already defined) since both .cpp and .cu files include this header (and thus both the MS C++ compiler and nvcc compiles it).

无论如何,似乎(通过反复试验,以及阅读网络上的帖子)出于某种病态的原因,deviceCaseParams 的声明和它的复制操作(对 cudaMemcpyToSymbol 的调用)必须在同一个文件中。目前我在 .cu 文件中有这两个,但我真的希望在 .cuh 文件中有参数结构,以便任何实现都可以看到它。这意味着我还必须在头文件中使用 copyMetaData 函数,但这会弄乱链接(已定义符号),因为 .cpp 和 .cu 文件都包含此头文件(因此 MS C++ 编译器和 nvcc 都编译它)。

Does anyone have any advice on design here?

有人对这里的设计有什么建议吗?

Update:See the comments

更新:见评论

回答by Tom

With an up-to-date CUDA (e.g. 3.2) you should be able to do the memcpy from within a different translation unit if you're looking up the symbol at runtime (i.e. by passing a string as the first arg to cudaMemcpyToSymbolas you are in your example).

随着上最新的CUDA(如3.2),你应该能够从一个不同的翻译单元中做的memcpy如果你正在寻找了在运行时(即符号传递一个字符串作为第一个参数来cudaMemcpyToSymbol为你在你的例子中)。

Also, with Fermi-class devices you can just malloc the memory (cudaMalloc), copy to the device memory, and then pass the argument as a const pointer. The compiler will recognise if you are accessing the data uniformly across the warps and if so will use the constant cache. See the CUDA Programming Guide for more info. Note: you would need to compile with -arch=sm_20.

此外,对于费米级设备,您可以只cudaMalloc分配内存 ( ),复制到设备内存,然后将参数作为常量指针传递。编译器将识别您是否正在跨经线统一访问数据,如果是,则将使用常量缓存。有关更多信息,请参阅 CUDA 编程指南。注意:您需要使用-arch=sm_20.

回答by Raffles

If you're using pre-Fermi CUDA, you will have found out by now that this problem doesn't just apply to constant memory, it applies to anything you want on the CUDA side of things. The only two ways I have found around this are to either:

如果您使用的是费米之前的 CUDA,您现在会发现这个问题不仅适用于常量内存,还适用于任何您想要的 CUDA 方面的问题。我发现的唯一两种方法是:

  1. Write everything CUDA in a single file (.cu), or
  2. If you need to break out code into separate files, restrict yourself to headers which your single .cu file then includes.
  1. 将所有 CUDA 写入单个文件 (.cu),或
  2. 如果您需要将代码分解为单独的文件,请将自己限制在单个 .cu 文件包含的标头中。

If you need to share code between CUDA and C/C++, or have some common code you share between projects, option 2 is the only choice. It seems very unnatural to start with, but it solves the problem. You still get to structure your code, just not in a typically C like way. The main overhead is that every time you do a build you compile everything. The plus side of this (which I think is possibly why it works this way) is that the CUDA compiler has access to all the source code in one hit which is good for optimisation.

如果您需要在 CUDA 和 C/C++ 之间共享代码,或者在项目之间共享一些公共代码,则选项 2 是唯一的选择。开始看起来很不自然,但它解决了问题。您仍然可以构建您的代码,只是不是以典型的类似 C 的方式。主要的开销是每次进行构建时都会编译所有内容。这样做的好处(我认为这可能是它以这种方式工作的原因)是 CUDA 编译器可以一次访问所有源代码,这有利于优化。