Python Django QuerySet 上的 Count 与 len

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14327036/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:06:14  来源:igfitidea点击:

Count vs len on a Django QuerySet

pythondjangoperformance

提问by antonagestam

In Django, given that I have a QuerySetthat I am going to iterate over and print the results of, what is the best option for counting the objects? len(qs)or qs.count()?

在 Django 中,考虑到我QuerySet将迭代并打印结果,计算对象的最佳选择是什么?len(qs)或者qs.count()

(Also given that counting the objects in the same iteration is not an option.)

(还考虑到在同一迭代中计算对象不是一种选择。)

采纳答案by Andy Hayden

Although the Django docsrecommend using countrather than len:

尽管Django 文档建议使用count而不是len

Note: Don't use len()on QuerySets if all you want to do is determine the number of records in the set. It's much more efficient to handle a count at the database level, using SQL's SELECT COUNT(*), and Django provides a count()method for precisely this reason.

注意:len()如果您只想确定集合中的记录数,请不要在 QuerySet 上使用。使用 SQL 在数据库级别处理计数要高效得多SELECT COUNT(*),而 Djangocount()正是出于这个原因提供了一种方法。

Since you are iterating this QuerySet anyway, the result will be cached(unless you are using iterator), and so it will be preferable to use len, since this avoids hitting the database again, and also the possibly of retrieving a different number of results!).
If you are using iterator, then I would suggest including a counting variable as you iterate through (rather than using count) for the same reasons.

由于您无论如何都在迭代这个 QuerySet,结果将被缓存(除非您正在使用iterator),因此最好使用len,因为这避免再次访问数据库,并且还可能检索不同数量的结果!) .
如果您正在使用iterator,那么出于同样的原因,我建议您在迭代(而不是使用计数)时包括一个计数变量。

回答by Rohan

I think using len(qs)makes more sense here as you need to iterate over the results. qs.count()is a better option if all that you want to do it print the count and not iterate over the results.

我认为使用len(qs)在这里更有意义,因为您需要迭代结果。qs.count()如果您想做的所有事情都打印计数而不是迭代结果,那么这是一个更好的选择。

len(qs)will hit the database with select * from tablewhereas qs.count()will hit the db with select count(*) from table.

len(qs)将达到与数据库select * from table,而qs.count()将达到与分贝select count(*) from table

also qs.count()will give return integer and you cannot iterate over it

qs.count()将给予退货整数,你不能叠代

回答by Krzysiek

Choosing between len()and count()depends on the situation and it's worth to deeply understand how they work to use them correctly.

之间进行选择len(),并count()视情况而定,这是值得深入了解他们的工作,以正确地使用它们。

Let me provide you with few scenarios:

我给你提供几个场景:

  1. (most crucial) When you only want to know the number of elements and you do not plan to process them in any way it's crucial to use count():

    DO:queryset.count()- this will perform single SELECT COUNT(*) some_tablequery, all computation is carried on RDBMS side, Python just needs to retrieve the result number with fixed cost of O(1)

    DON'T:len(queryset)- this will perform SELECT * FROM some_tablequery, fetching whole table O(N) and requiring additional O(N) memory for storing it. This is the worst that can be done

  2. When you intend to fetch the queryset anyway it's slightly better to use len()which won't cause an extra database query as count()would:

    len(queryset) # fetching all the data - NO extra cost - data would be fetched anyway in the for loop
    
    for obj in queryset: # data is already fetched by len() - using cache
        pass
    

    Count:

    queryset.count() # this will perform an extra db query - len() did not
    
    for obj in queryset: # fetching data
        pass
    
  3. Reverted 2nd case (when queryset has already been fetched):

    for obj in queryset: # iteration fetches the data
        len(queryset) # using already cached data - O(1) no extra cost
        queryset.count() # using cache - O(1) no extra db query
    
    len(queryset) # the same O(1)
    queryset.count() # the same: no query, O(1)
    
  1. (最关键)当您只想知道元素的数量并且不打算以任何方式处理它们时,使用count()以下方法至关重要:

    做:queryset.count()- 这将执行单个SELECT COUNT(*) some_table查询,所有计算都在 RDBMS 端进行,Python 只需要以 O(1) 的固定成本检索结果编号

    不要:len(queryset)- 这将执行SELECT * FROM some_table查询,获取整个表 O(N) 并需要额外的 O(N) 内存来存储它。这是可以做的最糟糕的事情

  2. 当您打算以任何方式获取len()查询集时,使用它会稍微好一点,它不会像那样导致额外的数据库查询count()

    len(queryset) # fetching all the data - NO extra cost - data would be fetched anyway in the for loop
    
    for obj in queryset: # data is already fetched by len() - using cache
        pass
    

    数数:

    queryset.count() # this will perform an extra db query - len() did not
    
    for obj in queryset: # fetching data
        pass
    
  3. 恢复了第二种情况(当查询集已经被获取时):

    for obj in queryset: # iteration fetches the data
        len(queryset) # using already cached data - O(1) no extra cost
        queryset.count() # using cache - O(1) no extra db query
    
    len(queryset) # the same O(1)
    queryset.count() # the same: no query, O(1)
    

Everything will be clear once you take a glance "under the hood":

一旦你看一眼“引擎盖下”,一切都会变得清晰:

class QuerySet(object):

    def __init__(self, model=None, query=None, using=None, hints=None):
        # (...)
        self._result_cache = None

    def __len__(self):
        self._fetch_all()
        return len(self._result_cache)

    def _fetch_all(self):
        if self._result_cache is None:
            self._result_cache = list(self.iterator())
        if self._prefetch_related_lookups and not self._prefetch_done:
            self._prefetch_related_objects()

    def count(self):
        if self._result_cache is not None:
            return len(self._result_cache)

        return self.query.get_count(using=self.db)

Good references in Django docs:

Django 文档中的好参考:

回答by funnydman

For people who prefer test measurements(Postresql):

对于喜欢测试测量的人(Postresql):

If we have a simple Person model and 1000 instances of it:

如果我们有一个简单的 Person 模型和它的 1000 个实例:

class Person(models.Model):
    name = models.CharField(max_length=100)
    age = models.SmallIntegerField()

    def __str__(self):
        return self.name

In average case it gives:

在平均情况下,它给出:

In [1]: persons = Person.objects.all()

In [2]: %timeit len(persons)                                                                                                                                                          
325 ns ± 3.09 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit persons.count()                                                                                                                                                       
170 ns ± 0.572 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

So how can you see count()almost 2xfaster than len()in this particular test case.

那么你怎么能看到比这个特定的测试案例快count()2倍呢len()

回答by Pablo Guerrero

Summarizing what others have already answered:

总结其他人已经回答的内容:

  • len()will fetch all the records and iterate over them.
  • count()will perform an SQL COUNT operation (much faster when dealing with big queryset).
  • len()将获取所有记录并遍历它们。
  • count()将执行 SQL COUNT 操作(在处理大查询集时要快得多)。

It is also true that if after this operation, the whole queryset will be iterated, then as as whole it could be slightly more efficient to use len().

如果在此操作之后,整个查询集将被迭代,那么作为一个整体,使用len().

However

然而

In some cases, for instance when having memory limitations, it could be convenient (when posible) to split the operation performed over the records. That can be achieved using django pagination.

在某些情况下,例如当有内存限制时,拆分对记录执行的操作可能很方便(如果可能)。这可以使用django pagination来实现。

Then, using count()would be the choice and you could avoid to have to fetch the entire queryset at once.

然后,使用count()将是选择,您可以避免必须一次获取整个查询集。