我什么时候应该在 Scala 中选择 Vector?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6928327/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-22 03:18:20  来源:igfitidea点击:

When should I choose Vector in Scala?

scalavectorscala-collections

提问by Duncan McGregor

It seems that Vectorwas late to the Scala collections party, and all the influential blog posts had already left.

VectorScala 收藏派对似乎已经晚了,所有有影响力的博文都已经离开了。

In Java ArrayListis the default collection - I might use LinkedListbut only when I've thought through an algorithm and care enough to optimise. In Scala should I be using Vectoras my default Seq, or trying to work out when Listis actually more appropriate?

在 Java 中ArrayList是默认集合 - 我可能会使用,LinkedList但只有在我考虑过算法并足够关心优化时才会使用。在 Scala 中,我应该将其Vector用作我的默认值Seq,还是尝试找出List实际上更合适的时间?

回答by Daniel Spiewak

As a general rule, default to using Vector. It's faster than Listfor almosteverything and more memory-efficient for larger-than-trivial sized sequences. See this documentationof the relative performance of Vector compared to the other collections. There are some downsides to going with Vector. Specifically:

作为一般规则,默认使用Vector. 这是比快List几乎一切,更多的内存效率比平凡的较大尺寸的序列。请参阅此文档,了解 Vector 与其他集合相比的相对性能。有一些缺点Vector。具体来说:

  • Updates at the head are slower than List(though not by as much as you might think)
  • 头部的更新比List(虽然没有你想象的那么慢)

Another downside before Scala 2.10 was that pattern matching support was better for List, but this was rectified in 2.10 with generalized +:and :+extractors.

Scala 2.10 之前的另一个缺点是模式匹配支持对 更好List,但这在 2.10 中通过通用+::+提取器得到了纠正。

There is also a more abstract, algebraic way of approaching this question: what sort of sequence do you conceptuallyhave? Also, what are you conceptuallydoing with it? If I see a function that returns an Option[A], I know that function has some holes in its domain (and is thus partial). We can apply this same logic to collections.

还有一种更抽象的代数方式来解决这个问题:你在概念上有什么样的序列?另外,你在概念上用它做什么?如果我看到一个返回 的函数Option[A],我知道该函数在其域中有一些漏洞(因此是部分的)。我们可以将同样的逻辑应用于集合。

If I have a sequence of type List[A], I am effectively asserting two things. First, my algorithm (and data) is entirely stack-structured. Second, I am asserting that the only things I'm going to do with this collection are full, O(n) traversals. These two really go hand-in-hand. Conversely, if I have something of type Vector[A], the onlything I am asserting is that my data has a well defined order and a finite length. Thus, the assertions are weaker with Vector, and this leads to its greater flexibility.

如果我有一个 type 序列List[A],我实际上是在断言两件事。首先,我的算法(和数据)完全是堆栈结构的。其次,我断言我要对这个集合做的唯一事情是满的,O(n) 遍历。这两个真的是相辅相成的。相反,如果我有类型Vector[A]的东西,我唯一要断言的是我的数据有一个明确定义的顺序和有限的长度。因此,断言较弱Vector,这导致其更大的灵活性。

回答by Daniel C. Sobral

Well, a Listcan be incredibly fast if the algorithm can be implemented solely with ::, headand tail. I had an object lesson of that very recently, when I beat Java's splitby generating a Listinstead of an Array, and couldn't beat that with anything else.

好吧,List如果算法可以单独使用::,head和来实现,那么 a可能会非常快tail。我最近有一个对象课程,当时我split通过生成 aList而不是 an击败了Java Array,并且无法用其他任何东西击败它。

However, Listhas a fundamental problem: it doesn't work with parallel algorithms. I cannot split a Listinto multiple segments, or concatenate it back, in an efficient manner.

但是,List有一个根本问题:它不适用于并行算法。我无法List以有效的方式将 a 拆分为多个段,或将其连接回去。

There are other kinds of collections that can handle parallelism much better -- and Vectoris one of them. Vectoralso has great locality -- which Listdoesn't -- which can be a real plus for some algorithms.

还有其他类型的集合可以更好地处理并行性——这Vector就是其中之一。Vector也有很大的局部性——List但没有——这对某些算法来说是一个真正的优势。

So, all things considered, Vectoris the best choice unlessyou have specific considerations that make one of the other collections preferable -- for example, you might choose Streamif you want lazy evaluation and caching (Iteratoris faster but doesn't cache), or Listif the algorithm is naturally implemented with the operations I mentioned.

因此,考虑到所有因素,这Vector是最佳选择,除非您有特定的考虑使其他集合之一更可取——例如,您可能会选择Stream是否需要延迟评估和缓存(Iterator更快但不缓存),或者List如果算法自然是用我提到的操作来实现的。

By the way, it is preferable to use Seqor IndexedSequnless you want a specific piece of API (such as List's ::), or even GenSeqor GenIndexedSeqif your algorithm can be run in parallel.

顺便说一句,最好使用SeqorIndexedSeq除非您想要特定的 API(例如List's ::),或者甚至GenSeq或者GenIndexedSeq如果您的算法可以并行运行。

回答by dth

Some of the statements here are confusing or even wrong, especially the idea that immutable.Vector in Scala is anything like an ArrayList. List and Vector are both immutable, persistent (i.e. "cheap to get a modified copy") data structures. There is no reasonable default choice as their might be for mutable data structures, but it rather depends on what your algorithm is doing. List is a singly linked list, while Vector is a base-32 integer trie, i.e. it is a kind of search tree with nodes of degree 32. Using this structure, Vector can provide most common operations reasonably fast, i.e. in O(log_32(n)). That works for prepend, append, update, random access, decomposition in head/tail. Iteration in sequential order is linear. List on the other hand just provides linear iteration and constant time prepend, decomposition in head/tail. Everything else takes in general linear time.

这里的一些陈述令人困惑甚至错误,尤其是 Scala 中的 immutable.Vector 与 ArrayList 类似的想法。List 和 Vector 都是不可变的、持久的(即“获得修改后的副本便宜”)数据结构。没有合理的默认选择,因为它们可能适用于可变数据结构,但这取决于您的算法正在做什么。List 是一个单向链表,而 Vector 是一个 base-32 整数 trie,即它是一种具有 32 度节点的搜索树。使用这种结构,Vector 可以相当快地提供最常见的操作,即在 O(log_32( n))。这适用于头/尾中的前置、附加、更新、随机访问、分解。按顺序进行的迭代是线性的。另一方面,List 仅提供线性迭代和恒定时间前置,头/尾分解。

This might look like as if Vector was a good replacement for List in almost all cases, but prepend, decomposition and iteration are often the crucial operations on sequences in a functional program, and the constants of these operations are (much) higher for vector due to its more complicated structure. I made a few measurements, so iteration is about twice as fast for list, prepend is about 100 times faster on lists, decomposition in head/tail is about 10 times faster on lists and generation from a traversable is about 2 times faster for vectors. (This is probably, because Vector can allocate arrays of 32 elements at once when you build it up using a builder instead of prepending or appending elements one by one). Of course all operations that take linear time on lists but effectively constant time on vectors (as random access or append) will be prohibitively slow on large lists.

这看起来好像在几乎所有情况下 Vector 都是 List 的良好替代品,但前置、分解和迭代通常是函数式程序中序列的关键操作,并且这些操作的常量对于 vector 来说(高得多)到其更复杂的结构。我做了一些测量,所以列表的迭代速度大约是原来的两倍,列表的 prepend 速度大约快 100 倍,列表的头/尾分解速度大约快 10 倍,而从可遍历的生成速度大约快 2 倍的向量。(这可能是因为 Vector 可以在使用构建器构建时一次分配 32 个元素的数组,而不是一个一个地添加或添加元素)。

So which data structure should we use? Basically, there are four common cases:

那么我们应该使用哪种数据结构呢?基本上,有四种常见情况:

  • We only need to transform sequences by operations like map, filter, fold etc: basically it does not matter, we should program our algorithm generically and might even benefit from accepting parallel sequences. For sequential operations List is probably a bit faster. But you should benchmark it if you have to optimize.
  • We need a lot of random access and different updates, so we should use vector, list will be prohibitively slow.
  • We operate on lists in a classical functional way, building them by prepending and iterating by recursive decomposition: use list, vector will be slower by a factor 10-100 or more.
  • We have an performance critical algorithm that is basically imperative and does a lot of random access on a list, something like in place quick-sort: use an imperative data structure, e.g. ArrayBuffer, locally and copy your data from and to it.
  • 我们只需要通过 map、filter、fold 等操作来转换序列:基本上没关系,我们应该对我们的算法进行通用编程,甚至可能从接受并行序列中受益。对于顺序操作 List 可能要快一点。但是如果你必须优化,你应该对它进行基准测试。
  • 我们需要大量的随机访问和不同的更新,所以我们应该使用向量,列表会非常慢。
  • 我们以经典的函数方式对列表进行操作,通过递归分解的前置和迭代来构建它们:使用列表,向量将慢 10-100 倍或更多。
  • 我们有一个性能关键算法,它基本上是命令式的,可以对列表进行大量随机访问,就像就地快速排序一样:在本地使用命令式数据结构,例如 ArrayBuffer,并从中复制您的数据。

回答by Luigi Plinge

For immutable collections, if you want a sequence, your main decision is whether to use an IndexedSeqor a LinearSeq, which give different guarantees for performance. An IndexedSeq provides fast random-access of elements and a fast length operation. A LinearSeq provides fast access only to the first element via head, but also has a fast tailoperation. (Taken from the Seq documentation.)

对于不可变集合,如果您想要一个序列,您的主要决定是使用 anIndexedSeq还是 a LinearSeq,它们为性能提供不同的保证。IndexedSeq 提供元素的快速随机访问和快速长度操作。LinearSeq 仅通过 提供对第一个元素的快速访问head,但也具有快速tail操作。(取自 Seq 文档。)

For an IndexedSeqyou would normally choose a Vector. Ranges and WrappedStrings are also IndexedSeqs.

对于 an,IndexedSeq您通常会选择Vector. Ranges 和WrappedStrings 也是 IndexedSeqs。

For a LinearSeqyou would normally choose a Listor its lazy equivalent Stream. Other examples are Queues and Stacks.

对于 a,LinearSeq您通常会选择 aList或其惰性等效项Stream。其他示例是Queues 和Stacks。

So in Java terms, ArrayListused similarly to Scala's Vector, and LinkedListsimilarly to Scala's List. But in Scala I would tend to use List more often than Vector, because Scala has much better support for functions that include traversal of the sequence, like mapping, folding, iterating etc. You will tend to use these functions to manipulate the list as a whole, rather than randomly accessing individual elements.

因此,在Java术语,ArrayList使用同样Scala的Vector,并且LinkedList同样Scala的List。但在 Scala 中,我倾向于使用 List 比 Vector 更频繁,因为 Scala 对包含序列遍历的函数有更好的支持,如映射、折叠、迭代等。您将倾向于使用这些函数来操作列表作为整体,而不是随机访问单个元素。

回答by Debilski

In situations which involve a lot random access and random mutation, a Vector(or – as the docssay – a Seq) seems to be a good compromise. This is also what the performance characteristicssuggest.

在涉及大量随机访问和随机突变的情况下, a Vector(或 - 正如文档所说 - a Seq)似乎是一个很好的折衷方案。这也是性能特征所暗示的。

Also, the Vectorclass seems to play nicely in distributed environments without much data duplication because there is no need to do a copy-on-write for the complete object. (See: http://akka.io/docs/akka/1.1.3/scala/stm.html#persistent-datastructures)

此外,Vector该类似乎在分布式环境中运行良好,没有太多数据重复,因为不需要对完整对象进行写时复制。(参见:http: //akka.io/docs/akka/1.1.3/scala/stm.html#persistent-datastructures

回答by Joshua Hartman

If you're programming immutably and need random access, Seq is the way to go (unless you want a Set, which you often actually do). Otherwise List works well, except it's operations can't be parallelized.

如果您的编程是一成不变的并且需要随机访问,那么 Seq 就是您要走的路(除非您想要一个 Set,而您实际上经常这样做)。否则 List 运行良好,除了它的操作不能并行化。

If you don't need immutable data structures, stick with ArrayBuffer since it's the Scala equivalent to ArrayList.

如果您不需要不可变的数据结构,请坚持使用 ArrayBuffer,因为它是 Scala 等价于 ArrayList。