C语言如何在 C 中声明和使用包含 10 亿个整数的庞大数组？

Question

提问by semteu

I'm implementing a sequential program for sorting like quicksort. I would like to test the performance of my program in a huge array of 1 or 10 billions of integers. But the problem is that I obtain a segmentation error due to the size of the array.

我正在实施一个顺序程序来进行排序，如快速排序。我想在包含 1 或 100 亿个整数的庞大数组中测试我的程序的性能。但问题是由于数组的大小，我获得了分段错误。

A sample code of declaration of this array:

该数组的声明示例代码：

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 1000000000

int main(int argc, char **argv)
{
  int list[N], i;
  srand(time(NULL));
  for(i=0; i<N; i++)
     list[i] = rand()%1000;
  return 0;
}

I got a proposition to use mmap function. But I don't know how to use it ? can anybody help me to use it ?

我得到了一个使用 mmap 函数的提议。但是不知道怎么用？有人可以帮我使用它吗？

I'm working on Ubuntu 10.04 64-bit, gcc version 4.4.3.

我正在使用 Ubuntu 10.04 64 位，gcc 版本 4.4.3。

Thanks for your replies.

感谢您的回复。

Answer 1

采纳答案by nmichaels

Michael is right, you can't fit that much on the stack. However, you can make it global (or static) if you don't want to malloc it.

迈克尔是对的，你不能在筹码上放那么多。但是，如果您不想对其进行 malloc，则可以将其设为全局（或静态）。

#include <stdlib.h>
#include <time.h>
#define N 1000000000
static int list[N];

int main(int argc, char **argv)
{
  size_t i;
  srand(time(NULL));
  for(i=0; i<N; i++)
     list[i] = rand()%1000;
  return 0;
}

Answer 2

回答by Michael Dorgan

You must use mallocfor this sort of allocation. That much on the stack will fail nearly every time.

您必须malloc用于这种分配。堆栈上的那么多内容几乎每次都会失败。

int *list;

list = malloc(N * sizeof(int));

This puts the allocation on the heap where there is a lot more memory available.

这将分配放在有更多可用内存的堆上。

Answer 3

回答by James McNellis

You probably don't create so large an array and if you do you certainly don't create it on the stack; the stack just isn't that big.

你可能不会创建这么大的数组，如果你这样做了，你肯定不会在堆栈上创建它；堆栈没有那么大。

If you have a 32-bit address space and a 4-byte int, then you can't create an array with a billion ints; there just won't be enough contiguous space in memory for that large an object (there probably won't be enough contiguous space for an object a fraction of that size). If you have a 64-bit address space, you might get away with allocating that much space.

如果你有一个 32 位的地址空间和一个 4 字节的int，那么你就不能创建一个有十亿个ints的数组；内存中没有足够的连续空间来容纳那个大的对象（可能没有足够的连续空间来容纳那个大小的一小部分的对象）。如果您有一个 64 位地址空间，您可能不会分配那么多空间。

If you really want to try, you'll need either to create it statically (i.e., declare the array at file scope or with the staticqualifier in the function) or dynamically (using malloc).

如果您真的想尝试，则需要静态（即，在文件范围内或static在函数中使用限定符声明数组）或动态（使用malloc）创建它。

Answer 4

回答by Jens Gustedt

On linux systems mallocof very large chunks just does a mmapunder the hood, so it is perhaps too tedious to look into that.

在malloc非常大的块的linux 系统上只是mmap在幕后做一个，所以研究它可能太乏味了。

Be careful that you don't have neither overflow (signed integers) nor silent wrap (unsigned integers) for your array bounds and indices. Use size_tas a type for that, since you are on a 64bit machine, this then should work.

请注意，您的数组边界和索引既没有溢出（有符号整数）也没有无提示换行（无符号整数）。使用size_t的类型，由于你是一个64位的机器上，那么这应该工作。

But as a habit you should definitively check your bounds against SIZE_MAX, something like assert(N*sizeof(data[0]) <= SIZE_MAX), to be sure.

但是作为一种习惯，您应该明确地检查您的边界SIZE_MAX，例如assert(N*sizeof(data[0]) <= SIZE_MAX)，以确保。

Answer 5

回答by Captain Giraffe

The stack allocations makes it break. N=1Gig ints => 4Gig of memory (both with a 32-bit and a 64-bit compiler). But if you want to measure the performance of quicksort, or a similar algorithm of yours, this is not the way to go about it. Try instead to use multiple quicksorts in sequence on prepared samples with a large size.

堆栈分配使它中断。N=1Gig ints => 4Gig 内存（带有 32 位和 64 位编译器）。但是如果你想衡量快速排序的性能，或者你的类似算法，这不是解决问题的方法。尝试在准备好的大尺寸样本上按顺序使用多个快速排序。

-create a large random sample not more than half your available memory.
make sure it doesn''t fill your ram!
If it does all measuring efforts are in vain. 
500 M elements is more than enough on a 4 gig system.

-decide on a test size ( e.g. N = 100 000 elements)
-start timer 
--- do the algoritm for ( *start @ i*N, *end @ (i+1)*N) 
(rinse repeat for next i until the large random sample is depleted)
-end timer

Now you have a very precise answer to how much time your algorithm has consumed. Run it a few times to get a feel of "how precise" (use a new srand(seed) seed each time). And change the N for more inspection.

现在，您对算法消耗了多少时间有了非常准确的答案。运行几次以了解“有多精确”（每次使用新的 srand(seed) 种子）。并更改 N 以进行更多检查。

Answer 6

回答by nmichaels

Another option is to dynamically allocate a linked list of smaller arrays. You'll have to wrap them with accessor functions, but it's far more likely that you can grab 16 256 MB chunks of memory than a single 4 GB chunk.

另一种选择是动态分配较小数组的链表。您必须使用访问器函数包装它们，但与单个 4 GB 块相比，您更有可能获得 16 256 MB 内存块。

typedef struct node_s node, *node_ptr;
struct node_s
{
    int data[N/NUM_NODES];
    node_ptr next;
};

C语言如何在 C 中声明和使用包含 10 亿个整数的庞大数组？

提问by semteu

采纳答案by nmichaels

回答by Michael Dorgan

回答by James McNellis

回答by Jens Gustedt

回答by Captain Giraffe

回答by nmichaels

相关推荐

最近更新

标签

C语言 如何在 C 中声明和使用包含 10 亿个整数的庞大数组？

提问by semteu

采纳答案by nmichaels

回答by Michael Dorgan

回答by James McNellis

回答by Jens Gustedt

回答by Captain Giraffe

回答by nmichaels

相关推荐

C语言 通过 calloc 分配的多维数组

C语言 如何通过C中的指针传递二维数组？

C语言 读取换行符时让 scanf 退出？

C语言 C scanf 语法帮助

相关推荐

最近更新

标签

C语言如何在 C 中声明和使用包含 10 亿个整数的庞大数组？

C语言通过 calloc 分配的多维数组

C语言如何通过C中的指针传递二维数组？

C语言读取换行符时让 scanf 退出？