制作一个非常大的 Java 数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/674186/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Making a very large Java array
提问by
I'm trying to find a counterexample to the Pólya Conjecturewhich will be somewhere in the 900 millions. I'm using a very efficient algorithm that doesn't even require any factorization (similar to a Sieve of Eratosthenes, but with even more information. So, a large array of ints is required.
我正在尝试为Pólya 猜想找到一个反例,该猜想将在 9 亿左右。我正在使用一种非常有效的算法,它甚至不需要任何因式分解(类似于 Eratosthenes 的筛选,但具有更多信息。因此,需要大量的整数。
The program is efficient and correct, but requires an array up to the x i want to check for (it checks all numbers from (2, x)). So, if the counterexample is in the 900 millions, I need an array that will be just as large. Java won't allow me anything over about 20 million. Is there anything I can possibly do to get an array that large?
该程序高效且正确,但需要一个数组,直到要检查的 xi 为止(它检查 (2, x) 中的所有数字)。所以,如果反例是 9 亿,我需要一个同样大的数组。Java 不允许我超过 2000 万。有什么我可以做的事情来获得这么大的数组吗?
回答by Tom Hawtin - tackline
What do you mean by "won't allow". You probably getting an OutOfMemoryError
, so add more memory with the -Xmx
command line option.
“不允许”是什么意思。您可能会得到一个OutOfMemoryError
, 因此使用-Xmx
命令行选项添加更多内存。
回答by jjnguy
You may want to extend the max size of the JVM Heap. You can do that with a command line option.
您可能希望扩展 JVM 堆的最大大小。您可以使用命令行选项执行此操作。
I believe it is -Xmx3600m (3600 megabytes)
我相信它是 -Xmx3600m(3600 兆字节)
回答by Aaron Digulla
回答by sfossen
If you don't need it all loaded in memory at once, you could segment it into files and store on disk.
如果您不需要一次将其全部加载到内存中,则可以将其分段为文件并存储在磁盘上。
回答by Bombe
Java will allow up to 2 billions array entries. It's your machine (and your limited memory) that can not handle such a large amount.
Java 将允许多达 20 亿个数组条目。是您的机器(以及您有限的内存)无法处理如此大的数量。
回答by Phil H
You could define your own class which stores the data in a 2d array which would be closer to sqrt(n) by sqrt(n). Then use an index function to determine the two indices of the array. This can be extended to more dimensions, as needed.
您可以定义自己的类,将数据存储在二维数组中,该数组通过 sqrt(n) 更接近 sqrt(n)。然后使用索引函数来确定数组的两个索引。这可以根据需要扩展到更多维度。
The main problem you will run into is running out of RAM. If you approach this limit, you'll need to rethink your algorithm or consider external storage (ie a file or database).
您将遇到的主要问题是内存不足。如果您接近此限制,则需要重新考虑您的算法或考虑外部存储(即文件或数据库)。
回答by Kris
900 million 32 bit ints with no further overhead - and there will always be more overhead - would require a little over 3.35 GiB. The only way to get that much memory is with a 64 bit JVM (on a machine with at least 8 GB of RAM) or use some disk backed cache.
9 亿个 32 位整数没有进一步的开销——而且总是会有更多的开销——需要 3.35 GiB 多一点。获得这么多内存的唯一方法是使用 64 位 JVM(在具有至少 8 GB RAM 的机器上)或使用一些磁盘支持的缓存。
回答by starblue
If your algorithm allows it:
如果您的算法允许:
Compute it in slices which fit into memory.
You will have to redo the computation for each slice, but it will often be fast enough.
Use an array of a smaller numeric type such as byte.
在适合内存的切片中计算它。
您将不得不为每个切片重做计算,但它通常足够快。
使用较小数值类型的数组,例如字节。
回答by Mike Houston
I wrote a version of the Sieve of Eratosthenes for Project Euler which worked on chunks of the search space at a time. It processes the first 1M integers (for example), but keeps each prime number it finds in a table. After you've iterated over all the primes found so far, the array is re-initialised and the primes found already are used to mark the array before looking for the next one.
我为 Project Euler 编写了一个版本的 Eratosthenes Sieve,它一次处理大量的搜索空间。它处理前 1M 个整数(例如),但将它找到的每个素数保留在一个表中。在您遍历到目前为止找到的所有素数之后,重新初始化数组,并且在查找下一个素数之前使用已经找到的素数来标记数组。
The table maps a prime to its 'offset' from the start of the array for the next processing iteration.
该表将素数映射到它从数组开始处的“偏移量”,以进行下一次处理迭代。
This is similar in concept (if not in implementation) to the way functional programming languages perform lazy evaluation of lists (although in larger steps). Allocating all the memory up-front isn't necessary, since you're only interested in the parts of the array that pass your test for primeness. Keeping the non-primes hanging around isn't useful to you.
这在概念上(如果不是在实现中)类似于函数式编程语言执行列表的惰性求值的方式(尽管步骤更大)。不需要预先分配所有内存,因为您只对通过素数测试的数组部分感兴趣。保留非质数对您没有用。
This method also provides memoisation for later iterations over prime numbers. It's faster than scanning your sparse sieve data structure looking for the ones every time.
该方法还为以后的素数迭代提供了记忆。它比每次都扫描稀疏筛分数据结构查找数据结构要快。
回答by Jason S
I second @sfossen's idea and @Aaron Digulla. I'd go for disk access. If your algorithm can take in a List interface rather than a plain array, you could write an adapter from the List to the memory mapped file.
我支持@sfossen 的想法和@Aaron Digulla。我会去访问磁盘。如果您的算法可以接受 List 接口而不是普通数组,则可以将适配器从 List 写入内存映射文件。