java java中大型列表的最佳列表实现是什么
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1756333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the best List implementation for Large lists in java
提问by rabbit
I have to create a large list of n elements (could be up to 100,000). each element in the list is an integer equivalent to the index of the list. After this I have to call Collections.shuffle on this list. My question is, which list implementation (either java collections or apache collections) should be used. My gut feeling is ArrayList can well be used here. All thoughts are appreciated. Thanks!
我必须创建一个包含 n 个元素的大列表(最多可达 100,000 个)。列表中的每个元素都是一个等效于列表索引的整数。在此之后,我必须在此列表中调用 Collections.shuffle。我的问题是,应该使用哪个列表实现(java 集合或 apache 集合)。我的直觉是 ArrayList 可以很好地用于这里。所有的想法都值得赞赏。谢谢!
Thanks for the inputs. I think I am sticking to the ArrayList. I am currently using the ArrayList constructor with the initialCapacity param and I pass the size of the list. So if the original list is 100000, I create this new list with new ArrayList(100000); Hence I think I don't have the create an array and do an asList since there won't be any resizing. Also, most of the apache collections Lists like GrowthList & LazyList do not implement RandomAccess. This for sure would slow down the shuffle (as per javadocs). FastArrayList does implement RandomAccess but apache has a note for this class saying "This class is not cross-platform. Using it may cause unexpected failures on some architectures".
感谢您的投入。我想我坚持使用 ArrayList。我目前正在使用带有 initialCapacity 参数的 ArrayList 构造函数,并传递列表的大小。所以如果原始列表是 100000,我用 new ArrayList(100000) 创建这个新列表;因此我认为我没有创建数组并执行 asList 因为不会有任何调整大小。此外,大多数 apache 集合列表(如 GrowthList 和 LazyList)都没有实现 RandomAccess。这肯定会减慢洗牌速度(根据 javadocs)。FastArrayList 确实实现了 RandomAccess,但 apache 对这个类有一个注释说“这个类不是跨平台的。在某些架构上使用它可能会导致意外失败”。
回答by unwind
ArrayList most probably has the least overhead per list element, so should be the best choice. It might be a worse choice if you frequently need to delete items in the middle of the list.
ArrayList 很可能每个列表元素的开销最少,所以应该是最好的选择。如果您经常需要删除列表中间的项目,这可能是一个更糟糕的选择。
回答by pgras
Quoted from the Collections.shuffle javadoc:
引用自 Collections.shuffle javadoc:
This method runs in linear time. If the specified list does not implement the RandomAccess interface and is large, this implementation dumps the specified list into an array before shuffling it, and dumps the shuffled array back into the list. This avoids the quadratic behavior that would result from shuffling a "sequential access" list in place.
该方法在线性时间内运行。如果指定的列表没有实现 RandomAccess 接口并且很大,则此实现将指定的列表转储到一个数组中,然后再对其进行混洗,并将混洗后的数组转储回列表中。这避免了由于将“顺序访问”列表改组到位而导致的二次行为。
So if you have no other needs i would go with ArrayList which implements RandomAccess.
因此,如果您没有其他需求,我将使用实现 RandomAccess 的 ArrayList。
回答by gustafc
Making an Integerarray and then wrapping it with Arrays.asListgives you even less overhead than a regular ArrayList.
制作一个Integer数组然后用它包装它Arrays.asList比普通的ArrayList.
List<Integer> makeList(int size){
if (size < 0) throw new IllegalArgumentException();
Integer[] arr = new Integer[size];
for (int i = 0; i < arr.length; ++i) arr[i] = i;
List<Integer> list = Arrays.asList(arr);
Collection.shuffle(list);
return list;
}
You save one entire intworth of space (... which admittedly is absolutely nothing in this context), but it does perform fewer range checks than the "real" ArrayList, so accessing will be slightly faster. Probably nothing you'll notice, though :)
您可以节省一整块int空间(……在这种情况下,这无疑是绝对没有的),但它执行的范围检查确实比 "real" 少ArrayList,因此访问速度会稍快一些。不过,您可能不会注意到任何事情:)
回答by Denis Tulskiy
Javolutionclaims to have the fastest List implementation in java. But I couldn't find any shuffle implementation in this library so you will have to do it by hand.
Javolution声称拥有 Java 中最快的 List 实现。但是我在这个库中找不到任何 shuffle 实现,所以你必须手工完成。
回答by Jon Skeet
ArrayList<T>would probably be fine, yes - but what criteria are you using for "best" anyway? And just how good does it have to be anyway? What are your trade-offs between complexity and "goodness" in whatever those criteria are?
ArrayList<T>可能没问题,是的 - 但无论如何你使用什么标准来衡量“最佳”?无论如何,它必须有多好?无论这些标准是什么,您在复杂性和“善良”之间的权衡是什么?
回答by Jared Levy
Google's Guavalibrary has some really nice primitive handling, including an Ints.asList()method returning a list that may be shuffled.
Google 的Guava库有一些非常好的原始处理,包括一个Ints.asList()方法,它返回一个可能被洗牌的列表。
The Guava project is still at a preliminary stage of deployment, though the code has been carefully reviewed and heavily used at Google. You'll need to retrieve the code from SVNand build the com.google.common.primitive classes.
Guava 项目仍处于部署的初步阶段,尽管代码已经过仔细并在谷歌大量使用。您需要从SVN检索代码并构建 com.google.common.primitive 类。
回答by Ertu?rul ?etin
回答by Stephen C
This is about your update to your question concerning FastArrayList.
这是关于您对有关FastArrayList.
FastArrayListdoes implementRandomAccessbut apache has a note for this class saying "This class is not cross-platform. Using it may cause unexpected failures on some architectures".
FastArrayList确实实现了,RandomAccess但 apache 有一个关于这个类的注释,上面写着“这个类不是跨平台的。在某些架构上使用它可能会导致意外失败”。
The FastArrayListclass (javadoc) is a concurrent list class. This is what the javadoc says:
在FastArrayList类(Javadoc中)是一个并行列表类。这是 javadoc 所说的:
A customized implementation of java.util.ArrayList designed to operate in a multithreaded environment where the large majority of method calls are read-only, instead of structural changes. When operating in "fast" mode, read calls are non-synchronized and write calls perform the following steps:
- Clone the existing collection
- Perform the modification on the clone
- Replace the existing collection with the (modified) clone
[...]
NOTE: If you are creating and accessing an ArrayList only within a single thread, you should use java.util.ArrayList directly (with no synchronization), for maximum performance.
NOTE: This class is not cross-platform [due to issues with fast mode and multiple threads]
java.util.ArrayList 的自定义实现,旨在在多线程环境中运行,其中大部分方法调用是只读的,而不是结构更改。在“快速”模式下运行时,读调用是非同步的,写调用执行以下步骤:
- 克隆现有集合
- 对克隆执行修改
- 用(修改后的)克隆替换现有集合
[...]
注意:如果您仅在单个线程中创建和访问 ArrayList,则应直接使用 java.util.ArrayList(无同步),以获得最佳性能。
注意:这个类不是跨平台的[由于快速模式和多线程的问题]
Now your use-case (as described) is single-threaded. So:
现在您的用例(如所述)是单线程的。所以:
- The "cross-platform" issue is not relevant because it only affects multi-threaded use-cases.
- The first "NOTE" says (clearly) that for a single-threaded application you are better off using an
ArrayList.
- “跨平台”问题无关紧要,因为它只影响多线程用例。
- 第一个“注意”说(清楚地)对于单线程应用程序,最好使用
ArrayList.
In short, the "fast" in FastArrayListis relative to (say) doing this:
简而言之,“快速”FastArrayList是相对于(比如说)这样做的:
List<String> myConcurrentlList = Collections.synchronizedList(new ArrayList<>());
Back to your original question. ArrayListis the simplest of the fast ways, and I doubt that any other Listclass will beat it. However, the following approachmaybe faster.
回到你原来的问题。 ArrayList是最简单的快速方法,我怀疑任何其他List类都可以击败它。但是,以下方法可能会更快。
String[] array = new String[...];
// populate array
// shuffle array ... using same algorithm as Collections.shuffle
for (int i = array.length; i > 1; i--)
swap(array, i - 1, rnd.nextInt(i));
}
List<String> list = Arrays.asList(array);
Why might that be faster? Because swap operations on an array will be faster than on an ArrayList.
为什么会更快?因为数组上的交换操作将比ArrayList.
Will it be faster overall? Hard to say. It depends on:
整体会更快吗?很难说。这取决于:
- whether you creating / populating the array like that is / isn't extra work
- whether the performance of list operations on an
asListwrapper compared to anArrayList... and what operations you perform, etc.
- 您是否创建/填充这样的数组是/不是额外的工作
asList包装器上的列表操作的性能与ArrayList...相比,以及您执行的操作等。
My advice would be to beware of "premature optimization".
我的建议是提防“过早优化”。
回答by bluebird
You can also use memory mapped file based list implementation . In such implementation the list is not fully present in memory but only a section of the huge list will be active in memory. If you are reaching heap space limitation (mostly in 32 bit jvm) you may be needing to make the list push off data seamlessly using a memory mapped file which will be faster than normal file I/O. One such implementation is described in this google codeand explained in this link.
您还可以使用基于内存映射文件的列表实现。在这样的实现中,列表并不完全存在于内存中,但只有一部分巨大列表在内存中处于活动状态。如果您达到堆空间限制(主要在 32 位 jvm 中),您可能需要使用比普通文件 I/O 更快的内存映射文件使列表无缝推送数据。这个谷歌代码中描述了一个这样的实现,并在这个链接中进行了解释。
回答by David Waters
ArrayList will be the best List for this. As the array backing will be very efficent for swapping elements used in shuffle.
ArrayList 将是最好的 List。由于数组支持对于交换 shuffle 中使用的元素非常有效。
But if you are really chacing performance you may wish to consider using an int[] or a custom list based in int[] as with all the List and List standard implementations you will be boxing and unboxing ints to Integers.
但是,如果您真的在追求性能,您可能希望考虑使用 int[] 或基于 int[] 的自定义列表,就像所有 List 和 List 标准实现一样,您将把 int 装箱和拆箱为整数。
This will not be an issue on the suffle as this will just be reordering pointers but you will be creating 100,000 objects when you may not need to. Assuming you know the size of your list before creation you can quite easily create a new List class that wraps a primitive array. If used as a java.util.List you will still need to box the return from any get method.
这不会是 suffle 的问题,因为这只是重新排序指针,但您可能不需要创建 100,000 个对象。假设您在创建之前知道列表的大小,您可以很容易地创建一个包装原始数组的新 List 类。如果用作 java.util.List,您仍然需要将任何 get 方法的返回值装箱。

