C语言 cudamalloc() 的使用。为什么是双指针?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7989039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Use of cudamalloc(). Why the double pointer?
提问by smilingbuddha
I am currently going through the tutorial examples on http://code.google.com/p/stanford-cs193g-sp2010/to learn CUDA. The code which demostrates __global__functions is given below. It simply creates two arrays, one on the CPU and one on the GPU, populates the GPU array with the number 7 and copies the GPU array data into the CPU array.
我目前正在通过http://code.google.com/p/stanford-cs193g-sp2010/上的教程示例来学习 CUDA。__global__下面给出了演示功能的代码。它只是创建了两个数组,一个在 CPU 上,一个在 GPU 上,用数字 7 填充 GPU 阵列,并将 GPU 阵列数据复制到 CPU 阵列中。
#include <stdlib.h>
#include <stdio.h>
__global__ void kernel(int *array)
{
int index = blockIdx.x * blockDim.x + threadIdx.x;
array[index] = 7;
}
int main(void)
{
int num_elements = 256;
int num_bytes = num_elements * sizeof(int);
// pointers to host & device arrays
int *device_array = 0;
int *host_array = 0;
// malloc a host array
host_array = (int*)malloc(num_bytes);
// cudaMalloc a device array
cudaMalloc((void**)&device_array, num_bytes);
int block_size = 128;
int grid_size = num_elements / block_size;
kernel<<<grid_size,block_size>>>(device_array);
// download and inspect the result on the host:
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
// print out the result element by element
for(int i=0; i < num_elements; ++i)
{
printf("%d ", host_array[i]);
}
// deallocate memory
free(host_array);
cudaFree(device_array);
}
My question is why have they worded the cudaMalloc((void**)&device_array, num_bytes);statement with a double pointer? Even heredefinition of cudamalloc() on says the first argument is a double pointer.
我的问题是为什么他们cudaMalloc((void**)&device_array, num_bytes);用双指针措辞声明?即使在这里cudamalloc() 的定义也说第一个参数是双指针。
Why not simply return a pointer to the beginning of the allocated memory on the GPU, just like the mallocfunction does on the CPU?
为什么不简单地返回一个指向 GPU 上已分配内存开头的指针,就像malloc函数在 CPU 上所做的那样?
采纳答案by CygnusX1
All CUDA API functions return an error code (or cudaSuccess if no error occured). All other parameters are passed by reference. However, in plain C you cannot have references, that's why you have to pass an address of the variable that you want the return information to be stored. Since you are returning a pointer, you need to pass a double-pointer.
所有 CUDA API 函数都返回错误代码(如果没有发生错误,则返回 cudaSuccess)。所有其他参数通过引用传递。但是,在普通 C 中,您不能有引用,这就是为什么您必须传递要存储返回信息的变量的地址。由于您正在返回一个指针,因此您需要传递一个双指针。
Another well-known function which operates on addresses for the same reason is the scanffunction. How many times have you forgotten to write this &before the variable that you want to store the value to? ;)
出于同样的原因,另一个众所周知的对地址进行操作的scanf函数是函数。您有多少次忘记&在要将值存储到的变量之前写这个?;)
int i;
scanf("%d",&i);
回答by R.. GitHub STOP HELPING ICE
This is simply a horrible, horrible API design. The problem with passing double-pointers for an allocation function that obtains abstract (void *) memory is that you have to make a temporary variable of type void *to hold the result, then assign it into the real pointer of the correct type you want to use. Casting, as in (void**)&device_array, is invalid C and results in undefined behavior. You should simply write a wrapper function that behaves like normal mallocand returns a pointer, as in:
这简直是一个可怕的、可怕的 API 设计。为获取抽象 ( void *) 内存的分配函数传递双指针的问题在于,您必须创建一个类型的临时变量void *来保存结果,然后将其分配给您要使用的正确类型的真实指针。转换,如(void**)&device_array,是无效的 C 并导致未定义的行为。您应该简单地编写一个行为像正常malloc并返回一个指针的包装函数,如下所示:
void *fixed_cudaMalloc(size_t len)
{
void *p;
if (cudaMalloc(&p, len) == success_code) return p;
return 0;
}
回答by jwdmsd
We cast it into double pointer because it's a pointer to the pointer. It has to point to a pointer of GPU memory. What cudaMalloc() does is that it allocates a memory pointer (with space) on GPU which is then pointed by the first argument we give.
我们将它转换为双指针,因为它是指向指针的指针。它必须指向 GPU 内存的指针。cudaMalloc() 的作用是在 GPU 上分配一个内存指针(带空间),然后由我们给出的第一个参数指向该指针。
回答by Louis T
In C/C++, you can allocate a block of memory dynamically at runtime by calling the mallocfunction.
在 C/C++ 中,您可以通过调用malloc函数在运行时动态分配内存块。
int * h_array;
h_array = malloc(sizeof(int));
The mallocfunction returns the address of the allocated memory block which can be stored in a variable of some kind of pointer.
Memory allocation in CUDA is a bit different in two ways,
该malloc函数返回分配的内存块的地址,该地址可以存储在某种指针的变量中。
CUDA 中的内存分配在两个方面有点不同,
- The
cudamallocreturn an integer as error code instead of a pointer to the memory block. In addition to the byte size to be allocated,
cudamallocalso requires a double void pointer as its first parameter.int * d_array cudamalloc((void **) &d_array, sizeof(int))
- 所述
cudamalloc返回一个整数作为错误代码,而不是一个指向存储块。 除了要分配的字节大小外,
cudamalloc还需要一个双空指针作为其第一个参数。int * d_array cudamalloc((void **) &d_array, sizeof(int))
The reason behind the first difference is that all CUDA API function follows the convention of returning an integer error code. So to make things consistent, cudamallocAPI also returns an integer.
第一个差异背后的原因是所有 CUDA API 函数都遵循返回整数错误代码的约定。所以为了使事情保持一致,cudamallocAPI 还返回一个整数。
There requirements for a double pointer as the function first argument can be understood in two steps.
需要一个双指针作为函数的第一个参数可以分两步理解。
Firstly, since we have already decided to make the cudamalloc return an integer value, we can no longer use it to return the address of the allocated memory. In C, the only other way for a function to communicate is by passing the pointer or address to the function. The function can make changes to the value stored at the address or the address where the pointer is pointing. The changes to those value can be later retrieved outside the function scope by using the same memory address.
首先,由于我们已经决定让 cudamalloc 返回一个整数值,我们不能再用它来返回分配内存的地址。在 C 中,函数进行通信的唯一其他方式是将指针或地址传递给函数。该函数可以更改存储在地址或指针指向的地址处的值。稍后可以使用相同的内存地址在函数范围之外检索对这些值的更改。
how the double pointer works
双指针的工作原理
The following diagram illustrated how it works with the double pointer.
下图说明了它如何与双指针一起工作。
int cudamalloc((void **) &d_array, int type_size) {
*d_array = malloc(type_size);
return return_code;
}
Why do we need the double pointer? Why this does work
为什么我们需要双指针?为什么这有效
I normally live the python world so I also struggled to understand why this will not work.
我通常生活在 python 世界中,所以我也很难理解为什么这行不通。
int cudamalloc((void *) d_array, int type_size) {
d_array = malloc(type_size);
...
return error_status;
}
So why it doesn't work? Because in C, when cudamallocis called, a local variable named d_array is created and assigned with the value of the first function argument. There is no way we can retrieve the value in that local variable outside the function's scope. That why we need to a pointer to a pointer here.
那么为什么它不起作用呢?因为在 C 中,当cudamalloc被调用时,会创建一个名为 d_array 的局部变量并赋值为第一个函数参数的值。我们无法在函数作用域之外检索该局部变量中的值。这就是为什么我们需要一个指向这里的指针的指针。
int cudamalloc((void *) d_array, int type_size) {
*d_array = malloc(type_size);
...
return return_code;
}
回答by flolo
The problem: you have to return two values: Return code AND pointer to memory (in case return code indicates success). So you must make one of it a pointer to return type. And as the return type you have the choice between return pointer to int (for error code) or return pointer to pointer (for memory address). There one solution is as good as the other (and one of it yields the pointer to pointer (I prefer to use this term instead of double pointer, as this sounds more as a pointer to a double floating point number)).
问题:您必须返回两个值:返回代码和指向内存的指针(如果返回代码表示成功)。因此,您必须将其中之一设为返回类型的指针。作为返回类型,您可以选择返回指向 int 的指针(用于错误代码)或返回指向指针的指针(用于内存地址)。有一个解决方案和另一个一样好(其中一个产生指向指针的指针(我更喜欢使用这个术语而不是double pointer,因为这听起来更像是一个指向双浮点数的指针))。
In malloc you have the nice property that you can have null pointers to indicate an error, so you basically need just one return value.. I am not sure if this is possible with a pointer to device memory, as it might be that there is no or a wrong null value (remember: This is CUDA and NOTAnsi C). It could be that the null pointer on the host system is entirely different from the null used for the device, and as such the return of null pointer to indicate errors does not work, and you must make the API this way (that would also mean that you have NO common NULL on both devices).
在 malloc 中,你有一个很好的属性,你可以用空指针来指示错误,所以你基本上只需要一个返回值..我不确定这是否可以用指向设备内存的指针,因为它可能是否或错误的空值(请记住:这是 CUDA 而不是Ansi C)。可能是主机系统上的空指针与用于设备的空指针完全不同,因此返回空指针以指示错误不起作用,您必须以这种方式制作 API(这也意味着您在两个设备上都没有共同的 NULL)。


