C++ 分配共享内存

Question

提问by lina

i am trying to allocate shared memory by using a constant parameter but getting an error. my kernel looks like this:

我正在尝试使用常量参数分配共享内存，但出现错误。我的内核看起来像这样：

__global__ void Kernel(const int count)
{
    __shared__ int a[count];
}

and i am getting an error saying

我收到一个错误说

error: expression must have a constant value

错误：表达式必须有一个常量值

count is const! Why am I getting this error? And how can I get around this?

计数是常量！为什么我收到这个错误？我怎样才能解决这个问题？

Answer 1

采纳答案by Oliver Charlesworth

constdoesn't mean "constant", it means "read-only".

const并不意味着“恒定”，它意味着“只读”。

A constant expression is something whose value is known to the compiler at compile-time.

常量表达式是编译器在编译时已知其值的东西。

Answer 2

回答by talonmies

CUDA supports dynamic shared memory allocation. If you define the kernel like this:

CUDA 支持动态共享内存分配。如果你这样定义内核：

__global__ void Kernel(const int count)
{
    extern __shared__ int a[];
}

and then pass the number of bytes required as the the third argument of the kernel launch

然后传递所需的字节数作为内核启动的第三个参数

Kernel<<< gridDim, blockDim, a_size >>>(count)

then it can be sized at run time. Be aware that the runtime only supports a single dynamically declared allocation per block. If you need more, you will need to use pointers to offsets within that single allocation. Also be aware when using pointers that shared memory uses 32 bit words, and all allocations must be 32 bit word aligned, irrespective of the type of the shared memory allocation.

然后它可以在运行时调整大小。请注意，运行时仅支持每个块一个动态声明的分配。如果您需要更多，您将需要使用指向该单个分配中的偏移量的指针。在使用指针时，请注意共享内存使用 32 位字，并且所有分配都必须是 32 位字对齐，而不管共享内存分配的类型。

Answer 3

回答by jmilloy

option one: declare shared memory with constant value (not the same as const)

选项一：用常量值声明共享内存（与不一样const）

__global__ void Kernel(int count_a, int count_b)
{
    __shared__ int a[100];
    __shared__ int b[4];
}

option two: declare shared memory dynamically in the kernel launch configuration:

选项二：在内核启动配置中动态声明共享内存：

__global__ void Kernel(int count_a, int count_b)
{
    extern __shared__ int *shared;
    int *a = &shared[0]; //a is manually set at the beginning of shared
    int *b = &shared[count_a]; //b is manually set at the end of a
}

sharedMemory = count_a*size(int) + size_b*size(int);
Kernel <<<numBlocks, threadsPerBlock, sharedMemory>>> (count_a, count_b);

note: Pointers to dynamically shared memory are allgiven the same address. I use two shared memory arrays to illustrate how to manually set up two arrays in shared memory.

注：指向动态共享内存的所有给同一个地址。我使用两个共享内存数组来说明如何在共享内存中手动设置两个数组。

Answer 4

回答by smh

From the "CUDA C Programming Guide":

来自“CUDA C编程指南”：

The execution configuration is specified by inserting an expression of the form:

通过插入以下形式的表达式来指定执行配置：

<<<Dg, Db, Ns, S>>>

where:

在哪里：

Dgis of type dim3and specifies the dimensioin and size of the grid ...
Dbis of type dim3and specifies the dimension and size of each block ...
Nsis of type size_tand specifies the number of bytesin shared memory that is dynamically allocated per block for this call in addition to the statically allocated memory. This dynamically allocated memory is used by any of the variables declared as an external array as mentioned in __shared__; Ns is optional argument which defaults to 0;
S is of type cudaStream_tand specifies the associated stream ...

Dg属于dim3类型，指定网格的尺寸和大小...
Db是dim3类型，指定每个块的维度和大小...
Ns是size_t类型，指定共享内存中的字节数，除了静态分配的内存外，还为此调用动态分配的每个块。这个动态分配的内存被声明为__shared__ 中提到的外部数组的任何变量使用；Ns 是可选参数，默认为 0；
S 是cudaStream_t类型并指定关联的流...

So by using the dynamical parameter Ns, the user can specify the total size of shared memory one kernel function can use, no matter how many shared variables there are in this kernel.

因此，通过使用动态参数 Ns，用户可以指定一个内核函数可以使用的共享内存的总大小，无论该内核中有多少共享变量。

Answer 5

回答by peeyush

You cannot declare shared variable like this..

你不能像这样声明共享变量..

__shared__ int a[count];

although if you are sure enough about the max size of array a then you can directly declare like

虽然如果你对数组 a 的最大大小有足够的把握，那么你可以直接声明像

__shared__ int a[100];

but in this case you should be worried about how many blocks are there in your program , since fixing shared memory to a block ( and not getting utilized fully), will lead you to context switching with global memory( high latency) , thus poor performance...

但在这种情况下，您应该担心程序中有多少个块，因为将共享内存固定到一个块（并且没有得到充分利用），将导致您使用全局内存（高延迟）进行上下文切换，从而导致性能不佳...

There is a nice solution to this problem to declare

这个问题有一个很好的解决方案来声明

extern __shared__ int a[];

and allocating the memory while calling kernel from memory like

并在从内存中调用内核时分配内存，例如

Kernel<<< gridDim, blockDim, a_size >>>(count)

but you should also be bothered here because if you are using more memory in blocks than you are assigning in kernel , you are going to getting unexpected results.

但是您也应该在这里感到困扰，因为如果您在块中使用的内存比在 kernel 中分配的内存多，您将得到意想不到的结果。

C++ 分配共享内存

提问by lina

采纳答案by Oliver Charlesworth

回答by talonmies

回答by jmilloy

回答by smh

回答by peeyush

相关推荐

最近更新

标签

C++ 分配共享内存

提问by lina

采纳答案by Oliver Charlesworth

回答by talonmies

回答by jmilloy

回答by smh

回答by peeyush

相关推荐

C++ 控制帧率的常用方法是什么？

C++ 如何在Qt中打印pdf文件

Eclipse CDT（用于 C++/C 开发）如何在键入时自动调用内容辅助

C++ glGenVertexArrays(1, &vao) 处的分段错误；

相关推荐

最近更新

标签