SQL 如何在 Rails 3/4 中批量运行更新?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23252811/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I run updates in batches in Rails 3/4?
提问by MothOnMars
I need to mass-update many thousands of records, and I would like to process the updates in batches. First, I tried:
我需要批量更新数千条记录,我想批量处理更新。首先,我试过:
Foo.where(bar: 'bar').find_in_batches.update_all(bar: 'baz')
...which I was hoping would generate SQL such as:
...我希望会生成 SQL,例如:
"UPDATE foo SET bar = 'baz' where bar='bar' AND id > (whatever id is passed in by find_in_batches)"
That doesn't work because find_in_batches returns an array, while update_all needs an ActiveRecord relation.
这不起作用,因为 find_in_batches 返回一个数组,而 update_all 需要一个 ActiveRecord 关系。
This is what I tried next:
这是我接下来尝试的:
Foo.where(bar: 'bar').select('id').find_in_batches do |foos|
ids = foos.map(&:id)
Foo.where(id: ids).update_all(bar: 'baz')
end
That works, but it obviously runs a select followed by the update, rather than a single update based on my 'where' conditions. Is there any way to clean this up, so that the select and update don't have to be separate queries?
这是有效的,但它显然运行一个选择然后更新,而不是基于我的“位置”条件的单个更新。有没有办法清理它,以便选择和更新不必是单独的查询?
回答by dlackty
In Rails 5, there's a new handy method ActiveRecord::Relation#in_batches
to solve this problem:
在 Rails 5 中,有一个新的方便的方法ActiveRecord::Relation#in_batches
来解决这个问题:
Foo.in_batches.update_all(bar: 'baz')
Check documentationfor details.
查看文档了解详细信息。
回答by pdobb
I'm surprised, too, that there isn't an easier way to do this... but I did come up with this approach:
我也很惊讶,没有更简单的方法可以做到这一点……但我确实想出了这种方法:
batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
Foo.where(bar: 'bar').order(:id)
.offset(offset)
.limit(batch_size)
.update_all(bar: 'baz')
end
Basically this will:
基本上这将:
- Create an array of offsets between
0
andFoo.count
stepping bybatch_size
each time. For example, ifFoo.count == 10500
you'd get:[0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
- Loop through these numbers and use them as an OFFSET in the SQL query, being sure to order by
id
, and limiting to thebatch_size
. - Update at most
batch_size
records whose "index" is greater thanoffset
.
- 每次创建
0
和Foo.count
步进之间的偏移量数组batch_size
。例如,如果Foo.count == 10500
你得到:[0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
- 循环遍历这些数字并将它们用作 SQL 查询中的 OFFSET,确保按 排序
id
,并限制为batch_size
. - 最多更新
batch_size
“索引”大于 的记录offset
。
This is basically the manual way to perform what you said you were hoping for in the generated SQL. Too bad it can't just be done this way already by a standard library method... though I'm sure you could create one of your own.
这基本上是在生成的 SQL 中执行您所说的希望的手动方式。太糟糕了,它不能通过标准库方法以这种方式完成……尽管我相信您可以创建自己的方法。
回答by Faisal
This is 2 years late, but the answers here are a) very slow for large data sets and b) ignore the builtin rails capabilities (http://api.rubyonrails.org/classes/ActiveRecord/Batches.html).
这已经晚了 2 年,但这里的答案是 a) 对于大型数据集来说非常慢,并且 b) 忽略内置的 Rails 功能(http://api.rubyonrails.org/classes/ActiveRecord/Batches.html)。
As the offset value increases, depending on your DB server, it will do a sequence scan until it reaches your block, and then fetches the data for processing. As your offset gets into the millions, this will be extremelyslow.
随着偏移值的增加,根据您的数据库服务器,它将执行序列扫描,直到到达您的块,然后获取数据进行处理。随着您的偏移量达到数百万,这将非常缓慢。
use the "find_each" iterator method:
使用“find_each”迭代器方法:
Foo.where(a: b).find_each do |bar|
bar.x = y
bar.save
end
This has the added benefit of running the model callbacks with each save. If you don't care for the callbacks, then try:
这具有在每次保存时运行模型回调的额外好处。如果您不关心回调,请尝试:
Foo.where(a: b).find_in_batches do |array_of_foo|
ids = array_of_foo.collect &:id
Foo.where(id: ids).update_all(x: y)
end
回答by Charlie Tran
pdobb's answer is on the right track, but didn't work for me in Rails 3.2.21 because of this issue of ActiveRecord not parsing OFFSET with UPDATE calls:
pdobb 的答案在正确的轨道上,但在 Rails 3.2.21 中对我不起作用,因为 ActiveRecord 没有使用 UPDATE 调用解析 OFFSET 的这个问题:
https://github.com/rails/rails/issues/10849
https://github.com/rails/rails/issues/10849
I modified the code accordingly and it worked fine for concurrently setting the default value on my Postgres table:
我相应地修改了代码,它可以很好地同时在我的 Postgres 表上设置默认值:
batch_size = 1000
0.step(Foo.count, batch_size).each do |offset|
Foo.where('id > ? AND id <= ?', offset, offset + batch_size).
order(:id).
update_all(foo: 'bar')
end
回答by Varun Natraaj
I've written a small method to invoke update_all in batches:
我写了一个小方法来批量调用update_all:
https://gist.github.com/VarunNatraaj/420c638d544be59eef85
https://gist.github.com/VarunNatraaj/420c638d544be59eef85
Hope it is useful! :)
希望有用!:)
回答by Paul Alexander
Haven't had a chance to test this yet but you might be able to use ARel and a sub query.
还没有机会对此进行测试,但您可能可以使用 ARel 和子查询。
Foo.where(bar: 'bar').select('id').find_in_batches do |foos|
Foo.where( Foo.arel_table[ :id ].in( foos.to_arel ) ).update_all(bar: 'baz')
end