制作一个非常大的 Java 数组

Question

提问by

I'm trying to find a counterexample to the Pólya Conjecturewhich will be somewhere in the 900 millions. I'm using a very efficient algorithm that doesn't even require any factorization (similar to a Sieve of Eratosthenes, but with even more information. So, a large array of ints is required.

我正在尝试为Pólya 猜想找到一个反例，该猜想将在 9 亿左右。我正在使用一种非常有效的算法，它甚至不需要任何因式分解（类似于 Eratosthenes 的筛选，但具有更多信息。因此，需要大量的整数。

The program is efficient and correct, but requires an array up to the x i want to check for (it checks all numbers from (2, x)). So, if the counterexample is in the 900 millions, I need an array that will be just as large. Java won't allow me anything over about 20 million. Is there anything I can possibly do to get an array that large?

该程序高效且正确，但需要一个数组，直到要检查的 xi 为止（它检查 (2, x) 中的所有数字）。所以，如果反例是 9 亿，我需要一个同样大的数组。Java 不允许我超过 2000 万。有什么我可以做的事情来获得这么大的数组吗？

Answer 1

回答by Tom Hawtin - tackline

What do you mean by "won't allow". You probably getting an OutOfMemoryError, so add more memory with the -Xmxcommand line option.

“不允许”是什么意思。您可能会得到一个OutOfMemoryError, 因此使用-Xmx命令行选项添加更多内存。

Answer 2

回答by jjnguy

You may want to extend the max size of the JVM Heap. You can do that with a command line option.

您可能希望扩展 JVM 堆的最大大小。您可以使用命令行选项执行此操作。

I believe it is -Xmx3600m (3600 megabytes)

我相信它是 -Xmx3600m（3600 兆字节）

Answer 3

回答by Aaron Digulla

Use a memory mapped file (Java 5 NIO package) instead. Or move the sieve into a small C library and use Java JNI.

改用内存映射文件（Java 5 NIO 包）。或者将筛子移动到一个小的 C 库中并使用Java JNI。

Answer 4

回答by sfossen

If you don't need it all loaded in memory at once, you could segment it into files and store on disk.

如果您不需要一次将其全部加载到内存中，则可以将其分段为文件并存储在磁盘上。

Answer 5

回答by Bombe

Java will allow up to 2 billions array entries. It's your machine (and your limited memory) that can not handle such a large amount.

Java 将允许多达 20 亿个数组条目。是您的机器（以及您有限的内存）无法处理如此大的数量。

Answer 6

回答by Phil H

You could define your own class which stores the data in a 2d array which would be closer to sqrt(n) by sqrt(n). Then use an index function to determine the two indices of the array. This can be extended to more dimensions, as needed.

您可以定义自己的类，将数据存储在二维数组中，该数组通过 sqrt(n) 更接近 sqrt(n)。然后使用索引函数来确定数组的两个索引。这可以根据需要扩展到更多维度。

The main problem you will run into is running out of RAM. If you approach this limit, you'll need to rethink your algorithm or consider external storage (ie a file or database).

您将遇到的主要问题是内存不足。如果您接近此限制，则需要重新考虑您的算法或考虑外部存储（即文件或数据库）。

Answer 7

回答by Kris

900 million 32 bit ints with no further overhead - and there will always be more overhead - would require a little over 3.35 GiB. The only way to get that much memory is with a 64 bit JVM (on a machine with at least 8 GB of RAM) or use some disk backed cache.

9 亿个 32 位整数没有进一步的开销——而且总是会有更多的开销——需要 3.35 GiB 多一点。获得这么多内存的唯一方法是使用 64 位 JVM（在具有至少 8 GB RAM 的机器上）或使用一些磁盘支持的缓存。

Answer 8

回答by starblue

If your algorithm allows it:

如果您的算法允许：

Compute it in slices which fit into memory.
You will have to redo the computation for each slice, but it will often be fast enough.
Use an array of a smaller numeric type such as byte.

在适合内存的切片中计算它。
您将不得不为每个切片重做计算，但它通常足够快。
使用较小数值类型的数组，例如字节。

Answer 9

回答by Mike Houston

I wrote a version of the Sieve of Eratosthenes for Project Euler which worked on chunks of the search space at a time. It processes the first 1M integers (for example), but keeps each prime number it finds in a table. After you've iterated over all the primes found so far, the array is re-initialised and the primes found already are used to mark the array before looking for the next one.

我为 Project Euler 编写了一个版本的 Eratosthenes Sieve，它一次处理大量的搜索空间。它处理前 1M 个整数（例如），但将它找到的每个素数保留在一个表中。在您遍历到目前为止找到的所有素数之后，重新初始化数组，并且在查找下一个素数之前使用已经找到的素数来标记数组。

The table maps a prime to its 'offset' from the start of the array for the next processing iteration.

该表将素数映射到它从数组开始处的“偏移量”，以进行下一次处理迭代。

This is similar in concept (if not in implementation) to the way functional programming languages perform lazy evaluation of lists (although in larger steps). Allocating all the memory up-front isn't necessary, since you're only interested in the parts of the array that pass your test for primeness. Keeping the non-primes hanging around isn't useful to you.

这在概念上（如果不是在实现中）类似于函数式编程语言执行列表的惰性求值的方式（尽管步骤更大）。不需要预先分配所有内存，因为您只对通过素数测试的数组部分感兴趣。保留非质数对您没有用。

This method also provides memoisation for later iterations over prime numbers. It's faster than scanning your sparse sieve data structure looking for the ones every time.

该方法还为以后的素数迭代提供了记忆。它比每次都扫描稀疏筛分数据结构查找数据结构要快。

Answer 10

回答by Jason S

I second @sfossen's idea and @Aaron Digulla. I'd go for disk access. If your algorithm can take in a List interface rather than a plain array, you could write an adapter from the List to the memory mapped file.

我支持@sfossen 的想法和@Aaron Digulla。我会去访问磁盘。如果您的算法可以接受 List 接口而不是普通数组，则可以将适配器从 List 写入内存映射文件。

制作一个非常大的 Java 数组

提问by

回答by Tom Hawtin - tackline

回答by jjnguy

回答by Aaron Digulla

回答by sfossen

回答by Bombe

回答by Phil H

回答by Kris

回答by starblue

回答by Mike Houston

回答by Jason S

相关推荐

最近更新

标签

制作一个非常大的 Java 数组

提问by

回答by Tom Hawtin - tackline

回答by jjnguy

回答by Aaron Digulla

回答by sfossen

回答by Bombe

回答by Phil H

回答by Kris

回答by starblue

回答by Mike Houston

回答by Jason S

相关推荐

Java 为什么 autoReconnect=true 似乎不起作用？

Java 在 Spring Boot 应用程序中禁用 Spring Security

Java Spring Security 中的 AuthenticationSuccessHandler

Android java.lang.VerifyError？

相关推荐

最近更新

标签