Ruby-on-rails 数组合并（联合）

Question

提问by Rabbott

I have two array I need to merge, and using the Union (|) operator is PAINFULLY slow.. are there any other ways to accomplish an array merge?

我有两个需要合并的数组，使用联合 (|) 运算符的速度非常慢.. 还有其他方法可以完成数组合并吗？

Also, the arrays are filled with objects, not strings.

此外，数组填充的是对象，而不是字符串。

An Example of the objects within the array

数组中的对象示例

#<Article 
 id: 1, 
 xml_document_id: 1, 
 source: "<article><domain>events.waikato.ac</domain><excerpt...", 
 created_at: "2010-02-11 01:32:46", 
 updated_at: "2010-02-11 01:41:28"
>

Where source is a short piece of XML.

其中 source 是一小段 XML。

EDIT

编辑

Sorry! By 'merge' I mean I need to not insert duplicates.

对不起！通过“合并”，我的意思是我不需要插入重复项。

A => [1, 2, 3, 4, 5]
B => [3, 4, 5, 6, 7]
A.magic_merge(B) #=> [1, 2, 3, 4, 5, 6, 7]

Understanding that the integers are actually Article objects, and the Union operator appears to take forever

理解整数实际上是文章对象，并且联合运算符似乎永远需要

Answer 1

回答by Alex Reisner

Here's a script which benchmarks two merge techniques: using the pipe operator (a1 | a2), and using concatenate-and-uniq ((a1 + a2).uniq). Two additional benchmarks give the time of concatenate and uniq individually.

这是对两种合并技术进行基准测试的脚本：使用管道运算符 ( a1 | a2) 和使用 concatenate-and-uniq ( (a1 + a2).uniq)。两个额外的基准测试分别给出了 concatenate 和 uniq 的时间。

require 'benchmark'

a1 = []; a2 = []
[a1, a2].each do |a|
  1000000.times { a << rand(999999) }
end

puts "Merge with pipe:"
puts Benchmark.measure { a1 | a2 }

puts "Merge with concat and uniq:"
puts Benchmark.measure { (a1 + a2).uniq }

puts "Concat only:"
puts Benchmark.measure { a1 + a2 }

puts "Uniq only:"
b = a1 + a2
puts Benchmark.measure { b.uniq }

On my machine (Ubuntu Karmic, Ruby 1.8.7), I get output like this:

在我的机器上（Ubuntu Karmic，Ruby 1.8.7），我得到如下输出：

Merge with pipe:
  1.000000   0.030000   1.030000 (  1.020562)
Merge with concat and uniq:
  1.070000   0.000000   1.070000 (  1.071448)
Concat only:
  0.010000   0.000000   0.010000 (  0.005888)
Uniq only:
  0.980000   0.000000   0.980000 (  0.981700)

Which shows that these two techniques are very similar in speed, and that uniqis the larger component of the operation. This makes sense intuitively, being O(n) (at best), whereas simple concatenation is O(1).

这表明这两种技术在速度上非常相似，这uniq是操作中较大的组成部分。这在直觉上是有道理的，是 O(n)（最多），而简单的连接是 O(1)。

So, if you really want to speed this up, you need to look at how the <=>operator is implemented for the objects in your arrays. I believe that most of the time is being spent comparing objects to ensure inequality between any pair in the final array.

因此，如果您真的想加快速度，您需要查看如何<=>为数组中的对象实现运算符。我相信大部分时间都花在比较对象上，以确保最终数组中任何一对之间的不平等。

Answer 2

回答by Andrew Grimm

Do you need the items to be in a specific order within the arrays? If not, you may want to check whether using Sets makes it faster.

您是否需要项目在数组中按特定顺序排列？如果没有，您可能需要检查使用Sets是否使其更快。

Update

更新

Adding to another answerer's code:

添加到另一个回答者的代码：

require "set"
require "benchmark"

a1 = []; a2 = []
[a1, a2].each do |a|
  1000000.times { a << rand(999999) }
end

s1, s2 = Set.new, Set.new

[s1, s2].each do |s|
  1000000.times { s << rand(999999) }
end

puts "Merge with pipe:"
puts Benchmark.measure { a1 | a2 }

puts "Merge with concat and uniq:"
puts Benchmark.measure { (a1 + a2).uniq }

puts "Concat only:"
puts Benchmark.measure { a1 + a2 }

puts "Uniq only:"
b = a1 + a2
puts Benchmark.measure { b.uniq }

puts "Using sets"
puts Benchmark.measure {s1 + s2}

puts "Starting with arrays, but using sets"
puts Benchmark.measure {s3, s4 = [a1, a2].map{|a| Set.new(a)} ; (s3 + s4)}

gives (for ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0])

给出（对于 ruby 1.8.7 (2008-08-11 patchlevel 72) [universal-darwin10.0]）

Merge with pipe:
  1.320000   0.040000   1.360000 (  1.349563)
Merge with concat and uniq:
  1.480000   0.030000   1.510000 (  1.512295)
Concat only:
  0.010000   0.000000   0.010000 (  0.019812)
Uniq only:
  1.460000   0.020000   1.480000 (  1.486857)
Using sets
  0.310000   0.010000   0.320000 (  0.321982)
Starting with arrays, but using sets
  2.340000   0.050000   2.390000 (  2.384066)

Suggests that sets may or may not be faster, depending on your circumstances (lots of merges or not many merges).

建议集合可能更快也可能不会更快，具体取决于您的情况（合并很多或合并不多）。

Answer 3

回答by Josh Delsman

Using the Array#concatmethod will likely be a lot faster, according to my initial benchmarks using Ruby 1.8.7:

Array#concat根据我使用 Ruby 1.8.7 的初始基准测试，使用该方法可能会快很多：

require 'benchmark'

def reset_arrays!
  @array1 = []
  @array2 = []

  [@array1, @array2].each do |array|
    10000.times { array << ActiveSupport::SecureRandom.hex }
  end
end

reset_arrays! && puts(Benchmark.measure { @array1 | @array2 })
# => 0.030000   0.000000   0.030000 (  0.026677)

reset_arrays! && puts(Benchmark.measure { @array1.concat(@array2) })
# => 0.000000   0.000000   0.000000 (  0.000122)

Answer 4

回答by nas

Try this and see if this is any faster

试试这个，看看这是否更快

a = [1,2,3,3,2]
b = [1,2,3,4,3,2,5,7]
(a+b).uniq

Ruby-on-rails 数组合并（联合）

提问by Rabbott

回答by Alex Reisner

回答by Andrew Grimm

回答by Josh Delsman

回答by nas

相关推荐

最近更新

标签

Ruby-on-rails 数组合并（联合）

提问by Rabbott

回答by Alex Reisner

回答by Andrew Grimm

回答by Josh Delsman

回答by nas

相关推荐

Ruby-on-rails Rails 中同一个模型的多对多关系？

Ruby-on-rails Rails javascript 仅在重新加载后才有效

Ruby-on-rails 如何使用 url 连接到 postgresql

Ruby-on-rails CORS - 通过允许服务器上的 Origin 实现没有 JSONP 的跨域 AJAX

相关推荐

最近更新

标签