Java 性能 - ArrayLists 与 Arrays 的大量快速读取
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1182892/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Performance - ArrayLists versus Arrays for lots of fast reads
提问by Bryan Head
I have a program where I need to make 100,000 to 1,000,000 random-access reads to a List-like object in as little time as possible (as in milliseconds) for a cellular automata-like program. I think the update algorithm I'm using is already optimized (keeps track of active cells efficiently, etc). The Lists do need to change size, but that performance is not as important. So I am wondering if the performance from using Arrays instead of ArrayLists is enough to make a difference when dealing with that many reads in such short spans of time. Currently, I'm using ArrayLists.
我有一个程序,我需要在尽可能短的时间内(以毫秒为单位)对类似 List 的对象进行 100,000 到 1,000,000 次随机访问读取,以用于类似细胞自动机的程序。我认为我正在使用的更新算法已经优化(有效地跟踪活动单元格等)。列表确实需要更改大小,但性能并不重要。所以我想知道使用 Arrays 而不是 ArrayLists 的性能是否足以在如此短的时间内处理如此多的读取。目前,我正在使用 ArrayLists。
Edit: I forgot to mention: I'm just storing integers, so another factor is using the Integer wrapper class (in the case of ArrayLists) versus ints (in the case of arrays). Does anyone know if using ArrayList will actually require 3 pointer look ups (one for the ArrayList, one for the underlying array, and one for the Integer->int) where as the array would only require 1 (array address+offset to the specific int)? Would HotSpot optimize the extra look ups away? How significant are those extra look ups?
编辑:我忘了提及:我只是存储整数,所以另一个因素是使用整数包装类(在 ArrayLists 的情况下)与整数(在数组的情况下)。有谁知道使用 ArrayList 是否实际上需要 3 次指针查找(一次用于 ArrayList,一次用于底层数组,另一次用于 Integer->int),而数组只需要 1 次(数组地址+偏移量到特定内部)?HotSpot 会优化额外的查找吗?这些额外的查找有多重要?
Edit2: Also, I forgot to mention I need to do random access writes as well (writes, not insertions).
Edit2:另外,我忘了提到我还需要进行随机访问写入(写入,而不是插入)。
采纳答案by Stephen C
Now that you've mentioned that your arrays are actually arrays of primitive types, consider using the collection-of-primitive-type classes in the Trovelibrary.
既然您已经提到您的数组实际上是原始类型的数组,请考虑使用Trove库中的原始类型集合类。
@viking reports significant (ten-fold!) speedup using Trove in his application - see comments. The flip-side is that Trove collection types are not type compatible with Java's standard collection APIs. So Trove (or similar libraries) won't be the answer in all cases.
@viking 报告了在他的应用程序中使用 Trove 的显着(十倍!)加速 - 请参阅评论。另一方面是 Trove 集合类型与 Java 的标准集合 API 的类型不兼容。所以 Trove(或类似的库)不会在所有情况下都是答案。
回答by Will Hartung
An Array will be faster simply because at a minimum it skips a function call (i.e. get(i)).
数组会更快,因为它至少会跳过函数调用(即 get(i))。
If you have a static size, then Arrays are your friend.
如果您有静态大小,那么数组就是您的朋友。
回答by James Skidmore
ArrayLists are slower than Arrays, but most people consider the difference to be minor. In your case could matter though, since you're dealing with hundreds of thousands of them.
ArrayLists 比 Arrays 慢,但大多数人认为差异很小。不过,在您的情况下可能很重要,因为您正在处理数十万个。
By the way, duplicate: Array or List in Java. Which is faster?
顺便说一句,重复:Java 中的数组或列表。哪个更快?
回答by Sev
If you're not going to be doing a lot more than reads from this structure, then go ahead and use an array as that would be faster when read by index.
如果你不打算做更多的事情而不是从这个结构中读取,那么继续使用数组,因为按索引读取时会更快。
However, consider how you're going to get the data in there, and if sorting, inserting, deleting, etc, are a concern at all. If so, you may want to consider other collection based structures.
但是,请考虑如何在其中获取数据,以及排序、插入、删除等是否是一个问题。如果是这样,您可能需要考虑其他基于集合的结构。
回答by Kevin Peterson
Try both, but measure.
两者都尝试,但要衡量。
Most likely you could hack something together to make the inner loop use arrays without changing all that much code. My suspicion is that HotSpot will already inline the method calls and you will see no performance gain.
很可能您可以将一些东西组合在一起,使内部循环使用数组,而无需更改所有代码。我怀疑 HotSpot 已经内联了方法调用,您将看不到性能提升。
Also, try Java 6 update 14 and use -XX:+DoEscapeAnalysis
另外,尝试 Java 6 update 14 并使用 -XX:+DoEscapeAnalysis
回答by Janusz
I would go with Kevin's advise.
我会接受凯文的建议。
Stay with the lists first and measure your performance if your programm is to slow compare it to a version with an array. If that gives you a measurable performance boost go with the arrays, if not stay with the lists because they will make your life much much easier.
如果您的程序缓慢,请先使用列表并测量您的性能,将其与带有数组的版本进行比较。如果这给你带来了可衡量的性能提升,那么使用数组,如果不使用列表,因为它们会让你的生活更轻松。
回答by Kevin Day
One possibility would be to re-implement ArrayList (it's not that hard), but expose the backing array via a lock/release call cycle. This gets you convenience for your writes, but exposes the array for a large series of read/write operations that you know in advance won't impact the array size. If the list is locked, add/delete is not allowed - just get/set.
一种可能性是重新实现 ArrayList(这并不难),但通过锁定/释放调用周期公开后备数组。这为您的写入提供了便利,但会为您事先知道不会影响数组大小的大量读/写操作公开数组。如果列表被锁定,则不允许添加/删除 - 只需获取/设置。
for example:
例如:
SomeObj[] directArray = myArrayList.lockArray();
try{
// myArrayList.add(), delete() would throw an illegal state exception
for (int i = 0; i < 50000; i++){
directArray[i] += 1;
}
} finally {
myArrayList.unlockArray();
}
This approach continues to encapsulate the array growth/etc... behaviors of ArrayList.
这种方法继续封装 ArrayList 的数组增长/等...行为。
回答by Peter Lawrey
Java uses double indirection for its objects so they can be moved about in memory and have its references still be valid, this means every reference lookup is equivalent to two pointer lookups. These extra lookups cannot be optimised away completely.
Java 对其对象使用双重间接寻址,因此它们可以在内存中移动并使其引用仍然有效,这意味着每次引用查找都相当于两次指针查找。这些额外的查找无法完全优化掉。
Perhaps even worse is your cache performance will be terrible. Accessing values in cache is goings to be many times faster than accessing values in main memory. (perhaps 10x) If you have an int[] you know the values will be consecutive in memory and thus load into cache readily. However, for Integer[] the Integers individual objects can appear randomly across your memory and are much more likely to be cache misses. Also Integer use 24 bytes which means they are much less likely to fit into your caches than 4 byte values.
也许更糟糕的是你的缓存性能会很糟糕。访问缓存中的值将比访问主内存中的值快很多倍。(可能是 10 倍)如果你有一个 int[] 你知道这些值在内存中是连续的,因此很容易加载到缓存中。但是,对于 Integer[] Integers 单个对象可以在您的内存中随机出现并且更有可能是缓存未命中。此外,Integer 使用 24 字节,这意味着它们比 4 字节值更不可能适合您的缓存。
If you update an Integer, this often results in a new object created which is many orders of magnitude than updating an int value.
如果您更新一个整数,这通常会导致创建一个新对象,该对象比更新一个 int 值高很多个数量级。
回答by Tom Hawtin - tackline
There will be an overhead from using an ArrayList
instead of an array, but it is very likely to be small. In fact, the useful bit of data in the ArrayList
can be stored in registers, although you will probably use more (List
size for instance).
使用ArrayList
数组而不是数组会产生开销,但它很可能很小。事实上, 中有用的数据位ArrayList
可以存储在寄存器中,尽管您可能会使用更多(List
例如大小)。
You mention in your edit that you are using wrapper objects. These do make a huge difference. If you are typically using the same value repeatedly, then a sensible cache policy may be useful (Integer.valueOf
gives the same results for -128 to 128). For primitives, primitive arrays usually win comfortably.
您在编辑中提到您正在使用包装器对象。这些确实有很大的不同。如果您通常重复使用相同的值,那么合理的缓存策略可能会很有用(Integer.valueOf
-128 到 128 的结果相同)。对于基元,基元数组通常会轻松获胜。
As a refinement, you might want to make sure the adjacent cells tend to be adjacent in the array (you can do better than rows of columns with a space filling curve).
作为改进,您可能希望确保相邻的单元格在数组中趋于相邻(您可以比具有空间填充曲线的列行做得更好)。
回答by rtperson
If you're creating the list once, and doing thousands of reads from it, the overhead from ArrayList may well be slight enough to ignore. If you're creatingthousands of lists, go with the standard array. Object creation in a loop quickly goes quadratic, simply because of all the overhead of instantiating the member variables, calling the constructors up the inheritance chain, etc.
如果您创建列表一次,并从中读取数千次,则 ArrayList 的开销可能很小,可以忽略。如果您要创建数千个列表,请使用标准数组。循环中的对象创建很快就会变成二次方,这仅仅是因为实例化成员变量、调用继承链上的构造函数等的所有开销。
Because of this -- and to answer your second question -- stick with standard ints rather than the Integer class. Profile both and you'll quickly (or, rather, slowly) see why.
正因为如此——并回答你的第二个问题——坚持使用标准整数而不是 Integer 类。分析两者,您将很快(或者更确切地说,是慢慢地)了解原因。