SQL 如何在 Django 中执行批量插入?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2655670/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I perform a batch insert in Django?
提问by Thierry Lam
In mysql, you can insert multiple rows to a table in one query for n > 0:
在 mysql 中,您可以在 n > 0 的一个查询中向表中插入多行:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9), ..., (n-2, n-1, n);
Is there a way to achieve the above with Django queryset methods? Here's an example:
有没有办法用 Django 查询集方法实现上述目标?下面是一个例子:
values = [(1, 2, 3), (4, 5, 6), ...]
for value in values:
SomeModel.objects.create(first=value[0], second=value[1], third=value[2])
I believe the above is calling an insert query for each iteration of the for loop. I'm looking for a single query, is that possible in Django?
我相信上面是为 for 循环的每次迭代调用插入查询。我正在寻找一个查询,这在 Django 中可能吗?
采纳答案by Jarret Hardie
I recently looked for such a thing myself (inspired by QuerySet.update(), as I imagine you are too). To my knowledge, no bulk create exists in the current production framework (1.1.1 as of today). We ended up creating a custom manager for the model that needed bulk-create, and created a function on that manager to build an appropriate SQL statement with the sequence of VALUES parameters.
我最近自己也在寻找这样的东西(灵感来自QuerySet.update(),我想你也是)。据我所知,当前的生产框架(截至今天的 1.1.1)中不存在批量创建。我们最终为需要批量创建的模型创建了一个自定义管理器,并在该管理器上创建了一个函数,以使用 VALUES 参数序列构建适当的 SQL 语句。
Something like (apologies if this does not work... hopefully I've adapted this runnably from our code):
类似的东西(如果这不起作用,请道歉......希望我已经从我们的代码中可运行地调整了它):
from django.db import models, connection
class MyManager(models.Manager):
def create_in_bulk(self, values):
base_sql = "INSERT INTO tbl_name (a,b,c) VALUES "
values_sql = []
values_data = []
for value_list in values:
placeholders = ['%s' for i in range(len(value_list))]
values_sql.append("(%s)" % ','.join(placeholders))
values_data.extend(value_list)
sql = '%s%s' % (base_sql, ', '.join(values_sql))
curs = connection.cursor()
curs.execute(sql, values_data)
class MyObject(models.Model):
# model definition as usual... assume:
foo = models.CharField(max_length=128)
# custom manager
objects = MyManager()
MyObject.objects.create_in_bulk( [('hello',), ('bye',), ('c', )] )
This approach does run the risk of being very specific to a particular database. In our case, we wanted the function to return the IDs just created, so we had a postgres-specific query in the function to generate the requisite number of IDs from the primary key sequence for the table that represents the object. That said, it does perform significantly better in tests versus iterating over the data and issuing separate QuerySet.create() statements.
这种方法确实存在对特定数据库非常特定的风险。在我们的例子中,我们希望函数返回刚刚创建的 ID,因此我们在函数中有一个特定于 postgres 的查询,以从表示对象的表的主键序列生成必要数量的 ID。也就是说,与迭代数据并发出单独的 QuerySet.create() 语句相比,它在测试中的表现明显更好。
回答by Fred
These answers are outdated. bulk_create
has been brought in Django 1.4:
这些答案已经过时了。bulk_create
已在 Django 1.4 中引入:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create
https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create
回答by Brian from QuantRocket
Here is way to do batch inserts that still goes through Django's ORM (and thus retains the many benefits the ORM provides). This approach involves subclassing the InsertQuery class as well as creating a custom manager that prepares model instances for insertion into the database in much the same way that Django's save() method uses. Most of the code for the BatchInsertQuery class below is straight from the InsertQuery class, with just a few key lines added or modified. To use the batch_insert method, pass in a set of model instances that you want to insert into the database. This approach frees up the code in your views from having to worry about translating model instances into valid SQL values; the manager class in conjunction with the BatchInsertQuery class handles that.
这是执行批量插入的方法,仍然通过 Django 的 ORM(因此保留了 ORM 提供的许多好处)。这种方法涉及对 InsertQuery 类进行子类化以及创建一个自定义管理器,该管理器以与 Django 的 save() 方法使用的方式几乎相同的方式准备模型实例以插入到数据库中。下面 BatchInsertQuery 类的大部分代码直接来自 InsertQuery 类,仅添加或修改了几行关键代码。要使用 batch_insert 方法,请传入一组要插入到数据库中的模型实例。这种方法使视图中的代码不必担心将模型实例转换为有效的 SQL 值;manager 类与 BatchInsertQuery 类一起处理。
from django.db import models, connection
from django.db.models.sql import InsertQuery
class BatchInsertQuery( InsertQuery ):
####################################################################
def as_sql(self):
"""
Constructs a SQL statement for inserting all of the model instances
into the database.
Differences from base class method:
- The VALUES clause is constructed differently to account for the
grouping of the values (actually, placeholders) into
parenthetically-enclosed groups. I.e., VALUES (a,b,c),(d,e,f)
"""
qn = self.connection.ops.quote_name
opts = self.model._meta
result = ['INSERT INTO %s' % qn(opts.db_table)]
result.append('(%s)' % ', '.join([qn(c) for c in self.columns]))
result.append( 'VALUES %s' % ', '.join( '(%s)' % ', '.join(
values_group ) for values_group in self.values ) ) # This line is different
params = self.params
if self.return_id and self.connection.features.can_return_id_from_insert:
col = "%s.%s" % (qn(opts.db_table), qn(opts.pk.column))
r_fmt, r_params = self.connection.ops.return_insert_id()
result.append(r_fmt % col)
params = params + r_params
return ' '.join(result), params
####################################################################
def insert_values( self, insert_values ):
"""
Adds the insert values to the instance. Can be called multiple times
for multiple instances of the same model class.
Differences from base class method:
-Clears self.columns so that self.columns won't be duplicated for each
set of inserted_values.
-appends the insert_values to self.values instead of extends so that
the values (actually the placeholders) remain grouped separately for
the VALUES clause of the SQL statement. I.e., VALUES (a,b,c),(d,e,f)
-Removes inapplicable code
"""
self.columns = [] # This line is new
placeholders, values = [], []
for field, val in insert_values:
placeholders.append('%s')
self.columns.append(field.column)
values.append(val)
self.params += tuple(values)
self.values.append( placeholders ) # This line is different
########################################################################
class ManagerEx( models.Manager ):
"""
Extended model manager class.
"""
def batch_insert( self, *instances ):
"""
Issues a batch INSERT using the specified model instances.
"""
cls = instances[0].__class__
query = BatchInsertQuery( cls, connection )
for instance in instances:
values = [ (f, f.get_db_prep_save( f.pre_save( instance, True ) ) ) \
for f in cls._meta.local_fields ]
query.insert_values( values )
return query.execute_sql()
########################################################################
class MyModel( models.Model ):
myfield = models.CharField(max_length=255)
objects = ManagerEx()
########################################################################
# USAGE:
object1 = MyModel(myfield="foo")
object2 = MyModel(myfield="bar")
object3 = MyModel(myfield="bam")
MyModels.objects.batch_insert(object1,object2,object3)
回答by Justin Abrahms
You might get the performance you need by doing manual transactions. What this will allow you to do is to create all the inserts in one transaction, then commit the transaction all at once. Hopefully this will help you: http://docs.djangoproject.com/en/dev/topics/db/transactions/
您可能会通过执行手动事务获得所需的性能。这将允许您做的是在一个事务中创建所有插入,然后一次提交所有事务。希望这会对您有所帮助:http: //docs.djangoproject.com/en/dev/topics/db/transactions/
回答by Mohamed
No it is not possible because django models are objects rather than a table. so table actions are not applicable to django models. and django creates an object then inserts data in to the table therefore you can't create multiple object in one time.
不,这是不可能的,因为 Django 模型是对象而不是表格。所以表操作不适用于 django 模型。django 创建一个对象,然后将数据插入到表中,因此您不能一次创建多个对象。