Python 在 Django ORM 中何时使用或不使用 iterator()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12681653/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When to use or not use iterator() in the django ORM
提问by Lucas Ou-Yang
This is from the django docs on the queryset iterator()method:
这是来自有关 querysetiterator()方法的Django 文档:
A QuerySet typically caches its results internally so that repeated evaluations do not result in additional queries. In contrast, iterator() will read results directly, without doing any caching at the QuerySet level (internally, the default iterator calls iterator() and caches the return value). For a QuerySet which returns a large number of objects that you only need to access once, this can results in better performance and a significant reduction in memory.
QuerySet 通常在内部缓存其结果,以便重复评估不会导致额外的查询。相比之下,iterator() 将直接读取结果,而不会在 QuerySet 级别做任何缓存(在内部,默认迭代器调用 iterator() 并缓存返回值)。对于返回大量只需要访问一次的对象的 QuerySet,这可以带来更好的性能并显着减少内存。
After reading, I'm still confused: The line about increased performance and memory reduction suggests we should just use the iterator()method. Can someone give some examples of good and bad cases iterator()usage?
读完后,我还是很困惑:关于提高性能和减少内存的那句话表明我们应该只使用该iterator()方法。有人可以举一些好的和坏的案例iterator()用法的例子吗?
Even if the query results are not cached, if they really wanted to access the models more than once, can't someone just do the following?
即使查询结果没有被缓存,如果他们真的想多次访问模型,难道有人不能只做以下事情吗?
saved_queries = list(Model.objects.all().iterator())
回答by Steven
Note the first part of the sentence you call out:
For a QuerySet which returns a large number of objects that you only need to access once
请注意您呼出的句子的第一部分:
For a QuerySet which returns a large number of objects that you only need to access once
So the converse of this is: if you need to re-use a set of results, and they are not so numerous as to cause a memory problem then you should not use iterator. Because the extra database round trip is alwaysgoing to reduce your performance vs. using the cached result.
所以反过来说:如果你需要重用一组结果,而且它们的数量不会多到导致内存问题,那么你不应该使用iterator. 因为与使用缓存结果相比,额外的数据库往返总是会降低您的性能。
You could force your QuerySet to be evaluated into a list but:
您可以强制您的 QuerySet 被评估为一个列表,但是:
- it requires more typing than just
saved_queries = Model.objects.all() - say you are paginating results on a web page: you will have forced all results into memory (back to possible memory problems) rather than allowing the subsequent paginator to select the slice of 20 results it needs
QuerySets are lazy, so you can have a context processor, for instance, that puts a QuerySet into the context of every request but only gets evaluated when you access it on certain requests but if you've forced evaluation that database hit happens every request
- 它需要的不仅仅是打字
saved_queries = Model.objects.all() - 假设您正在对网页上的结果进行分页:您将强制所有结果进入内存(回到可能的内存问题),而不是允许后续分页器选择它需要的 20 个结果片段
QuerySets 是惰性的,因此您可以拥有一个上下文处理器,例如,将 QuerySet 放入每个请求的上下文中,但仅在您对某些请求访问它时才进行评估,但如果您强制评估,则每次请求都会发生数据库命中
The typical web app case is for relatively small result sets (they have to be delivered to a browser in a timely fashion, so pagination or a similar technique is employed to decrease the data volume if required) so generally the standard QuerySetbehaviour is what you want. As you are no doubt aware, you must store the QuerySet in a variableto get the benefit of the caching.
典型的 Web 应用程序案例是针对相对较小的结果集(它们必须及时传送到浏览器,因此如果需要,可以使用分页或类似技术来减少数据量),因此通常标准QuerySet行为是您想要的. 您肯定知道,您必须将 QuerySet 存储在一个变量中才能获得缓存的好处。
Good use of iterator: processing results that take up a large amount of available memory (lots of small objects or fewer large objects). In my experience this is often in management commands when doing heavy data processing.
善用迭代器:处理占用大量可用内存(大量小对象或较少大对象)的结果。根据我的经验,这通常是在进行大量数据处理时的管理命令中。
回答by Tiago Silva
I agree with Steven and I would like to had an observation:
我同意史蒂文,我想有一个观察:
"it requires more typing than just saved_queries = Model.objects.all()". Yes it does but there is a major difference why you should use list(Model.objcts.all()). Let me give you an example, if you put the that assigned to a variable, it will execute the query and than save it there, let's imagine you have +1M records, so that means, you will have +1M records in a list taht you may or may not use immediately after, so I would recommend only using as Steven said, only using Model.objects.all(), because this assigned to a variable, it won't execute until you call the variable, saving you DB calls.
You should use the prefetch_related() to save you from doing to many calls into a DB and therefore, it will use the django reverse lookup to help you and save you tons of time.
“它需要更多的输入,而不仅仅是saved_queries = Model.objects.all()”。是的,但有一个主要区别,为什么你应该使用 list(Model.objcts.all())。让我举个例子,如果你把那个分配给一个变量,它会执行查询并将其保存在那里,假设你有 +1M 条记录,这意味着你将有 +1M 条记录在一个列表中您可能会也可能不会立即使用,所以我建议只使用史蒂文所说的,只使用Model.objects.all(),因为这分配给一个变量,它不会执行,直到您调用该变量,从而节省您的数据库调用。
您应该使用 prefetch_related() 来避免对数据库进行多次调用,因此,它将使用 django 反向查找来帮助您并节省大量时间。

