元素超过 4GB 的 Java 数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/878309/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java array with more than 4gb elements
提问by Omry Yadan
I have a big file, it's expected to be around 12 GB. I want to load it all into memory on a beefy 64-bit machine with 16 GB RAM, but I think Java does not support byte arrays that big:
我有一个大文件,预计大约 12 GB。我想将它全部加载到具有 16 GB RAM 的强大 64 位机器上的内存中,但我认为 Java 不支持那么大的字节数组:
File f = new File(file);
long size = f.length();
byte data[] = new byte[size]; // <- does not compile, not even on 64bit JVM
Is it possible with Java?
用Java可以吗?
The compile error from the Eclipse compiler is:
Eclipse 编译器的编译错误是:
Type mismatch: cannot convert from long to int
javac gives:
javac 给出:
possible loss of precision
found : long
required: int
byte data[] = new byte[size];
采纳答案by Bill the Lizard
Java array indices are of type int(4 bytes or 32 bits), so I'm afraid you're limited to 231? 1 or 2147483647 slots in your array. I'd read the data into another data structure, like a 2D array.
Java 数组索引是类型int(4 字节或 32 位),所以恐怕您仅限于 2 31?阵列中有 1 个或 2147483647 个插槽。我将数据读入另一个数据结构,如二维数组。
回答by William Deans
package com.deans.rtl.util;
import java.io.FileInputStream;
import java.io.IOException;
/**
*
* @author [email protected]
*
* Written to work with byte arrays requiring address space larger than 32 bits.
*
*/
public class ByteArray64 {
private final long CHUNK_SIZE = 1024*1024*1024; //1GiB
long size;
byte [][] data;
public ByteArray64( long size ) {
this.size = size;
if( size == 0 ) {
data = null;
} else {
int chunks = (int)(size/CHUNK_SIZE);
int remainder = (int)(size - ((long)chunks)*CHUNK_SIZE);
data = new byte[chunks+(remainder==0?0:1)][];
for( int idx=chunks; --idx>=0; ) {
data[idx] = new byte[(int)CHUNK_SIZE];
}
if( remainder != 0 ) {
data[chunks] = new byte[remainder];
}
}
}
public byte get( long index ) {
if( index<0 || index>=size ) {
throw new IndexOutOfBoundsException("Error attempting to access data element "+index+". Array is "+size+" elements long.");
}
int chunk = (int)(index/CHUNK_SIZE);
int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
return data[chunk][offset];
}
public void set( long index, byte b ) {
if( index<0 || index>=size ) {
throw new IndexOutOfBoundsException("Error attempting to access data element "+index+". Array is "+size+" elements long.");
}
int chunk = (int)(index/CHUNK_SIZE);
int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
data[chunk][offset] = b;
}
/**
* Simulates a single read which fills the entire array via several smaller reads.
*
* @param fileInputStream
* @throws IOException
*/
public void read( FileInputStream fileInputStream ) throws IOException {
if( size == 0 ) {
return;
}
for( int idx=0; idx<data.length; idx++ ) {
if( fileInputStream.read( data[idx] ) != data[idx].length ) {
throw new IOException("short read");
}
}
}
public long size() {
return size;
}
}
}
回答by Yes - that Jake.
If necessary, you can load the data into an array of arrays, which will give you a maximum of int.maxValue squaredbytes, more than even the beefiest machine would hold well in memory.
如有必要,您可以将数据加载到数组数组中,这将为您提供最大的 int.maxValue平方字节,甚至比最强大的机器在内存中所能容纳的还要多。
回答by pjc50
I suggest you define some "block" objects, each of which holds (say) 1Gb in an array, then make an array of those.
我建议你定义一些“块”对象,每个对象在一个数组中保存(比如)1Gb,然后创建一个数组。
回答by Tom Hawtin - tackline
No, arrays are indexed by ints (except some versions of JavaCard that use shorts). You will need to slice it up into smaller arrays, probably wrapping in a type that gives you get(long), set(long,byte), etc. With sections of data that large, you might want to map the file use java.nio.
不,数组由ints索引(除了一些使用shorts的 JavaCard 版本)。您将需要切片成更小的阵列,可能是在一个类型,让你包裹get(long),set(long,byte)等有了数据的部分是大,你可能要映射文件的使用java.nio中。
回答by Jeff Mc
You might consider using FileChannel and MappedByteBuffer to memory map the file,
您可能会考虑使用 FileChannel 和 MappedByteBuffer 来内存映射文件,
FileChannel fCh = new RandomAccessFile(file,"rw").getChannel();
long size = fCh.size();
ByteBuffer map = fCh.map(FileChannel.MapMode.READ_WRITE, 0, fileSize);
Edit:
编辑:
Ok, I'm an idiot it looks like ByteBuffer only takes a 32-bit index as well which is odd since the size parameter to FileChannel.map is a long... But if you decide to break up the file into multiple 2Gb chunks for loading I'd still recommend memory mapped IO as there can be pretty large performance benefits. You're basically moving all IO responsibility to the OS kernel.
好吧,我是个白痴,看起来 ByteBuffer 也只需要一个 32 位的索引,这很奇怪,因为 FileChannel.map 的大小参数很长......但是如果你决定将文件分成多个 2Gb 块对于加载,我仍然推荐内存映射 IO,因为它可以带来相当大的性能优势。您基本上将所有 IO 责任转移到操作系统内核。
回答by Anas
don't limit your self with Integer.MAX_VALUE
不要用 Integer.MAX_VALUE 限制你自己
although this question has been asked many years ago, but a i wanted to participate with a simple example using only java se without any external libraries
虽然这个问题在很多年前就有人问过了,但是 ai 想用一个简单的例子来参与,只使用 java se,没有任何外部库
at first let's say it's theoretically impossible but practically possible
首先让我们说这在理论上是不可能的,但实际上是可能的
a new look: if the array is an object of elements what about having an object that is array of arrays
新外观:如果数组是元素对象,那么拥有一个数组对象会怎样
here's the example
这是例子
import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.List;
/**
*
* @author Anosa
*/
public class BigArray<t>{
private final static int ARRAY_LENGTH = 1000000;
public final long length;
private List<t[]> arrays;
public BigArray(long length, Class<t> glasss)
{
this.length = length;
arrays = new ArrayList<>();
setupInnerArrays(glasss);
}
private void setupInnerArrays(Class<t> glasss)
{
long numberOfArrays = length / ARRAY_LENGTH;
long remender = length % ARRAY_LENGTH;
/*
we can use java 8 lambdas and streams:
LongStream.range(0, numberOfArrays).
forEach(i ->
{
arrays.add((t[]) Array.newInstance(glasss, ARRAY_LENGTH));
});
*/
for (int i = 0; i < numberOfArrays; i++)
{
arrays.add((t[]) Array.newInstance(glasss, ARRAY_LENGTH));
}
if (remender > 0)
{
//the remainer will 100% be less than the [ARRAY_LENGTH which is int ] so
//no worries of casting (:
arrays.add((t[]) Array.newInstance(glasss, (int) remender));
}
}
public void put(t value, long index)
{
if (index >= length || index < 0)
{
throw new IndexOutOfBoundsException("out of the reange of the array, your index must be in this range [0, " + length + "]");
}
int indexOfArray = (int) (index / ARRAY_LENGTH);
int indexInArray = (int) (index - (indexOfArray * ARRAY_LENGTH));
arrays.get(indexOfArray)[indexInArray] = value;
}
public t get(long index)
{
if (index >= length || index < 0)
{
throw new IndexOutOfBoundsException("out of the reange of the array, your index must be in this range [0, " + length + "]");
}
int indexOfArray = (int) (index / ARRAY_LENGTH);
int indexInArray = (int) (index - (indexOfArray * ARRAY_LENGTH));
return arrays.get(indexOfArray)[indexInArray];
}
}
}
and here's the test
这是测试
public static void main(String[] args)
{
long length = 60085147514l;
BigArray<String> array = new BigArray<>(length, String.class);
array.put("peace be upon you", 1);
array.put("yes it worj", 1755);
String text = array.get(1755);
System.out.println(text + " i am a string comming from an array ");
}
this code is only limited by only Long.MAX_VALUEand Java heapbut you can exceed it as you want (I made it 3800 MB)
此代码仅受Long.MAX_VALUEJava堆和 Java堆的限制,但您可以根据需要超过它(我将其设为 3800 MB)
i hope this is useful and provide a simple answer
我希望这是有用的,并提供一个简单的答案
回答by Nick Fortescue
As others have said, all Java arrays of all types are indexed by int, and so can be of max size 231? 1, or 2147483647 elements (~2 billion). This is specified by the Java Language Specificationso switching to another operating system or Java Virtual Machine won't help.
正如其他人所说,所有类型的所有 Java 数组都由 索引int,因此最大大小可以为 2 31吗?1 或 2147483647 个元素(约 20 亿)。这是由Java 语言规范指定的,因此切换到另一个操作系统或 Java 虚拟机将无济于事。
If you wanted to write a class to overcome this as suggested above you could, which could use an array of arrays (for a lot of flexibility) or change types (a longis 8 bytes so a long[]can be 8 times bigger than a byte[]).
如果您想按照上面的建议编写一个类来克服这个问题,您可以使用一个数组数组(以获得很大的灵活性)或更改类型(along是 8 个字节,因此 along[]可以是 a 的 8 倍byte[])。
回答by nicola
java doesn't support direct array with more than 2^32 elements presently,
java 目前不支持超过 2^32 个元素的直接数组,
hope to see this feature of java in future
希望以后能看到java的这个特性
回答by Tim Cooper
I think the idea of memory-mapping the file (using the CPU's virtual memory hardware) is the right approach. Except that MappedByteBuffer has the same limitation of 2Gb as native arrays. This guy claims to have solved the problem with a pretty simple alternative to MappedByteBuffer:
我认为内存映射文件的想法(使用 CPU 的虚拟内存硬件)是正确的方法。除了 MappedByteBuffer 具有与本机数组相同的 2Gb 限制。这个家伙声称已经用一个非常简单的 MappedByteBuffer 替代方案解决了这个问题:
http://nyeggen.com/post/2014-05-18-memory-mapping-%3E2gb-of-data-in-java/
http://nyeggen.com/post/2014-05-18-memory-mapping-%3E2gb-of-data-in-java/
https://gist.github.com/bnyeggen/c679a5ea6a68503ed19f#file-mmapper-java
https://gist.github.com/bnyeggen/c679a5ea6a68503ed19f#file-mmapper-java
Unfortunately the JVM crashes when you read beyond 500Mb.
不幸的是,当您读取超过 500Mb 时,JVM 会崩溃。

