Python Django ORM 中 select_related 和 prefetch_related 有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31237042/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the difference between select_related and prefetch_related in Django ORM?
提问by NeoWang
In Django doc,
在 Django 文档中,
select_related()
"follows" foreign-key relationships, selecting additional related-object data when it executes its query.
prefetch_related()
does a separate lookup for each relationship, and does the "joining" in Python.
select_related()
“遵循”外键关系,在执行查询时选择其他相关对象数据。
prefetch_related()
对每个关系进行单独的查找,并在 Python 中进行“连接”。
What does it mean by "doing the joining in python"? Can someone illustrate with an example?
“加入python”是什么意思?有人可以举例说明吗?
My understanding is that for foreign key relationship, use select_related
; and for M2M relationship, use prefetch_related
. Is this correct?
我的理解是,对于外键关系,使用select_related
; 对于 M2M 关系,请使用prefetch_related
. 这样对吗?
采纳答案by CrazyCasta
Your understanding is mostly correct. You use select_related
when the object that you're going to be selecting is a single object, so OneToOneField
or a ForeignKey
. You use prefetch_related
when you're going to get a "set" of things, so ManyToManyField
s as you stated or reverse ForeignKey
s. Just to clarify what I mean by "reverse ForeignKey
s" here's an example:
你的理解大部分是正确的。您可以使用select_related
时,你将要选择的对象是一个对象,所以OneToOneField
还是ForeignKey
。您可以使用prefetch_related
时,你会得到一个东西的“设置”,所以ManyToManyField
S作为你陈述或反向ForeignKey
秒。只是为了澄清我所说的“反向ForeignKey
s”的意思,这里有一个例子:
class ModelA(models.Model):
pass
class ModelB(models.Model):
a = ForeignKey(ModelA)
ModelB.objects.select_related('a').all() # Forward ForeignKey relationship
ModelA.objects.prefetch_related('modelb_set').all() # Reverse ForeignKey relationship
The difference is that select_related
does an SQL join and therefore gets the results back as part of the table from the SQL server. prefetch_related
on the other hand executes another query and therefore reduces the redundant columns in the original object (ModelA
in the above example). You may use prefetch_related
for anything that you can use select_related
for.
不同之处在于select_related
执行 SQL 连接并因此将结果作为表的一部分从 SQL 服务器中取回。prefetch_related
另一方面执行另一个查询,因此减少了原始对象中的冗余列(ModelA
在上面的例子中)。你可以prefetch_related
用于任何你可以使用的东西select_related
。
The tradeoffs are that prefetch_related
has to create and send a list of IDs to select back to the server, this can take a while. I'm not sure if there's a nice way of doing this in a transaction, but my understanding is that Django always just sends a list and says SELECT ... WHERE pk IN (...,...,...) basically. In this case if the prefetched data is sparse (let's say U.S. State objects linked to people's addresses) this can be very good, however if it's closer to one-to-one, this can waste a lot of communications. If in doubt, try both and see which performs better.
权衡是prefetch_related
必须创建一个 ID 列表并将其发送回服务器,这可能需要一段时间。我不确定在事务中是否有一个很好的方法来做到这一点,但我的理解是 Django 总是只发送一个列表并说 SELECT ... WHERE pk IN (...,...,...)基本上。在这种情况下,如果预取数据是稀疏的(假设美国州对象链接到人们的地址)这可能非常好,但是如果它更接近一对一,这可能会浪费大量通信。如果有疑问,请尝试两者,看看哪个表现更好。
Everything discussed above is basically about the communications with the database. On the Python side however prefetch_related
has the extra benefit that a single object is used to represent each object in the database. With select_related
duplicate objects will be created in Python for each "parent" object. Since objects in Python have a decent bit of memory overhead this can also be a consideration.
上面讨论的所有内容基本上都是关于与数据库的通信。然而,在 Python 方面prefetch_related
有一个额外的好处,即使用单个对象来表示数据库中的每个对象。随着select_related
重复的对象将Python中的每个“父”对象被创建。由于 Python 中的对象有相当多的内存开销,这也是一个考虑因素。
回答by cdosborn
Both methods achieve the same purpose, to forego unnecessary db queries. But they use different approaches for efficiency.
这两种方法都达到了相同的目的,即放弃不必要的数据库查询。但是他们使用不同的方法来提高效率。
The only reason to use either of these methods is when a single large query is preferable to many small queries. Django uses the large query to create models in memory preemptively rather than performing on demand queries against the database.
使用这两种方法中的任何一种的唯一原因是单个大查询比许多小查询更可取。Django 使用大查询在内存中抢先创建模型,而不是对数据库执行按需查询。
select_related
performs a join with each lookup, but extends the select to include the columns of all joined tables. However this approach has a caveat.
select_related
每次查找都执行连接,但扩展选择以包括所有连接表的列。然而,这种方法有一个警告。
Joins have the potential to multiply the number of rows in a query. When you perform a join over a foreign key or one-to-one field, the number of rows won't increase. However, many-to-many joins do not have this guarantee. So, Django restricts select_related
to relations that won't unexpectedly result in a massive join.
联接有可能使查询中的行数成倍增加。当您对外键或一对一字段执行连接时,行数不会增加。但是,多对多连接没有这种保证。因此,Django 限制select_related
不会意外导致大规模连接的关系。
The "join in python"for prefetch_related
is a little more alarming then it should be. It creates a separate query for each table to be joined. It filters each of these table with a WHERE IN clause, like:
在“加入蟒蛇”的prefetch_related
是更令人震惊的话,应该是一点点。它为要连接的每个表创建一个单独的查询。它使用 WHERE IN 子句过滤每个表,例如:
SELECT "credential"."id",
"credential"."uuid",
"credential"."identity_id"
FROM "credential"
WHERE "credential"."identity_id" IN
(84706, 48746, 871441, 84713, 76492, 84621, 51472);
Rather than performing a single join with potentially too many rows, each table is split into a separate query.
每个表都被拆分为一个单独的查询,而不是执行可能有太多行的单个连接。
回答by Amin.B
As Django documentation says:
正如 Django 文档所说:
prefetch_related()
Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.
This has a similar purpose to select_related, in that both are designed to stop the deluge of database queries that is caused by accessing related objects, but the strategy is quite different.
select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many' relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining' in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.
prefetch_related()
返回一个 QuerySet,它将在单个批处理中自动检索每个指定查找的相关对象。
这与 select_related 的目的相似,都旨在阻止因访问相关对象而导致的数据库查询泛滥,但策略却大不相同。
select_related 通过创建 SQL 连接并在 SELECT 语句中包含相关对象的字段来工作。为此, select_related 获取同一数据库查询中的相关对象。但是,为了避免因跨“多”关系加入而导致的更大的结果集, select_related 仅限于单值关系 - 外键和一对一。
另一方面,prefetch_related 对每个关系进行单独的查找,并在 Python 中进行“连接”。除了 select_related 支持的外键和一对一关系之外,这允许它预取使用 select_related 无法完成的多对多和多对一对象。它还支持 GenericRelation 和 GenericForeignKey 的预取,但是,它必须限制为一组同类结果。例如,仅当查询仅限于一种 ContentType 时,才支持预取由 GenericForeignKey 引用的对象。
More information about this: https://docs.djangoproject.com/en/2.2/ref/models/querysets/#prefetch-related
关于此的更多信息:https: //docs.djangoproject.com/en/2.2/ref/models/querysets/#prefetch-related
回答by Jarvis
Gone through the already posted answers. Just thought it would be better if I add an answer with actual example.
浏览了已经发布的答案。只是认为如果我用实际示例添加答案会更好。
Let' say you have 3 Django models which are related.
假设您有 3 个相关的 Django 模型。
class M1(models.Model):
name = models.CharField(max_length=10)
class M2(models.Model):
name = models.CharField(max_length=10)
select_relation = models.ForeignKey(M1, on_delete=models.CASCADE)
prefetch_relation = models.ManyToManyField(to='M3')
class M3(models.Model):
name = models.CharField(max_length=10)
Here you can query M2
model and its relative M1
objects using select_relation
field and M3
objects using prefetch_relation
field.
在这里您可以使用字段和使用字段的对象查询M2
模型及其相关M1
对象。select_relation
M3
prefetch_relation
However as we've mentioned M1
's relation from M2
is a ForeignKey
, it just returns only 1record for any M2
object. Same thing applies for OneToOneField
as well.
然而,正如我们所提到M1
的,从M2
是 a的关系ForeignKey
,它只返回任何对象的1 条记录M2
。同样的事情也适用OneToOneField
。
But M3
's relation from M2
is a ManyToManyField
which might return any number of M1
objects.
但是M3
的关系 fromM2
是 a ManyToManyField
,它可能返回任意数量的M1
对象。
Consider a case where you have 2 M2
objects m21
, m22
who have same 5associated M3
objects with IDs 1,2,3,4,5
. When you fetch associated M3
objects for each of those M2
objects, if you use select related, this is how it's going to work.
考虑这样一种情况,您有 2 个M2
对象m21
,m22
它们具有相同的5 个M3
带有 ID 的关联对象1,2,3,4,5
。当您M3
为每个对象获取关联对象时M2
,如果您使用 select related,这就是它的工作方式。
Steps:
脚步:
- Find
m21
object. - Query all the
M3
objects related tom21
object whose IDs are1,2,3,4,5
. - Repeat same thing for
m22
object and all otherM2
objects.
- 找
m21
对象。 - 查询所有
M3
与m21
ID 为的对象相关的对象1,2,3,4,5
。 - 对
m22
对象和所有其他M2
对象重复相同的操作。
As we have same 1,2,3,4,5
IDs for both m21
, m22
objects, if we use select_related option, it's going to query the DB twice for the same IDs which were already fetched.
因为我们有相同1,2,3,4,5
的ID两个m21
,m22
对象,如果我们使用select_related选项,它会查询数据库两次,这已经获取相同的ID。
Instead if you use prefetch_related, when you try to get M2
objects, it will make a note of all the IDs that your objects returned (Note: only the IDs) while querying M2
table and as last step, Django is going to make a query to M3
table with the set of all IDs that your M2
objects have returned. and join them to M2
objects using Python instead of database.
相反,如果您使用 prefetch_related,当您尝试获取M2
对象时,它会记录您的对象在查询M2
表时返回的所有 ID(注意:仅 ID),并且作为最后一步,Django 将对M3
表进行查询带有您的M2
对象返回的所有 ID 的集合。并M2
使用 Python 而不是数据库将它们连接到对象。
This way you're querying all the M3
objects only once which improves performance.
通过这种方式,您M3
只查询一次所有对象,从而提高性能。