Python 如何使用 Django 查询集中的条件注释计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33775011/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:58:04  来源:igfitidea点击:

How to annotate Count with a condition in a Django queryset

pythondjangodjango-queryset

提问by Hassan Baig

Using Django ORM, can one do something like queryset.objects.annotate(Count('queryset_objects', gte=VALUE)). Catch my drift?

使用 Django ORM,可以做类似queryset.objects.annotate(Count('queryset_objects', gte=VALUE)). 抓住我的漂移?



Here's a quick example to use for illustrating a possible answer:

这是一个用于说明可能答案的快速示例:

In a Django website, content creators submit articles, and regular users view (i.e. read) the said articles. Articles can either be published (i.e. available for all to read), or in draft mode. The models depicting these requirements are:

在 Django 网站中,内容创建者提交文章,普通用户查看(即阅读)所述文章。文章可以发布(即可供所有人阅读)或草稿模式。描述这些要求的模型是:

class Article(models.Model):
    author = models.ForeignKey(User)
    published = models.BooleanField(default=False)

class Readership(models.Model):
    reader = models.ForeignKey(User)
    which_article = models.ForeignKey(Article)
    what_time = models.DateTimeField(auto_now_add=True)

My question is:How can I get all published articles, sorted by unique readership from the last 30 mins? I.e. I want to count how many distinct (unique) views each published article got in the last half an hour, and then produce a list of articles sorted by these distinct views.

我的问题是:如何获得所有已发表的文章,按过去 30 分钟内的唯一读者群排序?即我想计算每篇发表的文章在过去半小时内获得了多少不同(独特)的浏览量,然后生成一个按这些不同浏览量排序的文章列表。



I tried:

我试过:

date = datetime.now()-timedelta(minutes=30)
articles = Article.objects.filter(published=True).extra(select = {
  "views" : """
  SELECT COUNT(*)
  FROM myapp_readership
    JOIN myapp_article on myapp_readership.which_article_id = myapp_article.id
  WHERE myapp_readership.reader_id = myapp_user.id
  AND myapp_readership.what_time > %s """ % date,
}).order_by("-views")

This sprang the error: syntax error at or near "01"(where "01" was the datetime object inside extra). It's not much to go on.

这引发了错误:“01”处或附近的语法错误(其中“01”是额外的日期时间对象)。没有什么可以继续的。

采纳答案by GwynBleidD

For django >= 1.8

对于 Django >= 1.8

Use Conditional Aggregation:

使用条件聚合

from django.db.models import Count, Case, When, IntegerField
Article.objects.annotate(
    numviews=Count(Case(
        When(readership__what_time__lt=treshold, then=1),
        output_field=IntegerField(),
    ))
)

Explanation:normal query through your articles will be annotated with numviewsfield. That field will be constructed as a CASE/WHEN expression, wrapped by Count, that will return 1 for readership matching criteria and NULLfor readership not matching criteria. Count will ignore nulls and count only values.

说明:通过您的文章进行的正常查询将使用numviews字段进行注释。该字段将构造为 CASE/WHEN 表达式,由 Count 包装,对于读者群匹配条件和NULL读者群不匹配条件将返回 1 。Count 将忽略空值并只计算值。

You will get zeros on articles that haven't been viewed recently and you can use that numviewsfield for sorting and filtering.

您将在最近未查看的文章上获得零,并且您可以使用该numviews字段进行排序和过滤。

Query behind this for PostgreSQL will be:

PostgreSQL 在这背后的查询将是:

SELECT
    "app_article"."id",
    "app_article"."author",
    "app_article"."published",
    COUNT(
        CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN 1
        ELSE NULL END
    ) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
    ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"

If we want to track only unique queries, we can add distinction into Count, and make our Whenclause to return value, we want to distinct on.

如果我们只想跟踪唯一查询,我们可以在 中添加区别Count,并使我们的When子句返回值,我们想要区别于。

from django.db.models import Count, Case, When, CharField, F
Article.objects.annotate(
    numviews=Count(Case(
        When(readership__what_time__lt=treshold, then=F('readership__reader')), # it can be also `readership__reader_id`, it doesn't matter
        output_field=CharField(),
    ), distinct=True)
)

That will produce:

这将产生:

SELECT
    "app_article"."id",
    "app_article"."author",
    "app_article"."published",
    COUNT(
        DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"
        ELSE NULL END
    ) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
    ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"

For django < 1.8 and PostgreSQL

对于 django < 1.8 和 PostgreSQL

You can just use rawfor executing SQL statement created by newer versions of django. Apparently there is no simple and optimized method for querying that data without using raw(even with extrathere are some problems with injecting required JOINclause).

您可以仅raw用于执行由较新版本的 django 创建的 SQL 语句。显然,没有简单和优化的方法可以在不使用的情况下查询该数据raw(即使extra注入 requiredJOIN子句也存在一些问题)。

Articles.objects.raw('SELECT'
    '    "app_article"."id",'
    '    "app_article"."author",'
    '    "app_article"."published",'
    '    COUNT('
    '        DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"'
    '        ELSE NULL END'
    '    ) as "numviews"'
    'FROM "app_article" LEFT OUTER JOIN "app_readership"'
    '    ON ("app_article"."id" = "app_readership"."which_article_id")'
    'GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"')

回答by dtatarkin

For django >= 2.0 you can use Conditional aggregation with a filterargumentin the aggregate functions:

对于 django >= 2.0,您可以在聚合函数中使用带有filter参数的条件聚合

from datetime import timedelta
from django.utils import timezone
from django.db.models import Count, Q # need import

Article.objects.annotate(
    numviews=Count(
        'readership__reader__id', 
        filter=Q(readership__what_time__gt=timezone.now() - timedelta(minutes=30)), 
        distinct=True
    )
)