postgresql 在 postgres 的聚合函数中 DISTINCT ON

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30077639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:54:06  来源:igfitidea点击:

DISTINCT ON in an aggregate function in postgres

sqljsonpostgresqldistinctaggregate-functions

提问by Migwell

For my problem, we have a schema whereby one photo has many tags and also many comments. So if I have a query where I want all the comments and tags, it will multiply the rows together. So if one photo has 2 tags and 13 comments, I get 26 rows for that one photo:

对于我的问题,我们有一个架构,其中一张照片有很多标签和很多评论。因此,如果我有一个需要所有评论和标签的查询,它会将行相乘。因此,如果一张照片有 2 个标签和 13 条评论,我会为那张照片获得 26 行:

SELECT
        tag.name, 
        comment.comment_id
FROM
        photo
        LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
        LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
        LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id

enter image description here

在此处输入图片说明

That's fine for most things, but it means that if I GROUP BYand then json_agg(tag.*), I get 13 copies of the first tag, and 13 copies of the second tag.

这对大多数事情都没有问题,但这意味着如果 IGROUP BY和 then json_agg(tag.*),我会得到第一个标签的 13 个副本,以及第二个标签的 13 个副本。

SELECT json_agg(tag.name) as tags
FROM
        photo
        LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
        LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
        LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id

enter image description here

在此处输入图片说明

Instead I want an array that is only 'suburban' and 'city', like this:

相反,我想要一个只有“郊区”和“城市”的数组,如下所示:

 [
      {"tag_id":1,"name":"suburban"}, 
      {"tag_id":2,"name":"city"}
 ]

I could json_agg(DISTINCT tag.name), but this will only make an array of tag names, when I want the entire row as json. I would like to json_agg(DISTINCT ON(tag.name) tag.*), but that's not valid SQL apparently.

我可以json_agg(DISTINCT tag.name),但是当我想要整行作为 json 时,这只会生成一个标签名称数组。我想要json_agg(DISTINCT ON(tag.name) tag.*),但这显然不是有效的 SQL。

How then can I simulate DISTINCT ONinside an aggregate function in Postgres?

那么如何DISTINCT ON在 Postgres 中的聚合函数内部进行模拟?

回答by Paul A Jungwirth

Whenever you have a central table and want to left-join it to many rows in table A and also left-join it to many rows in table B, you get these problems of duplicating rows. It can especially throw off aggregrate functions like COUNTand SUMif you're not careful! So I think you need to construct your tags-for-each-photo and comments-for-each-photo separately, and then join them together:

每当您有一个中央表并希望将其左连接到表 A 中的多行,并将其左连接到表 B 中的多行时,就会遇到重复行的这些问题。如果您不小心,它尤其可以抛出聚合函数,例如COUNTSUM如果您不小心!所以我认为你需要分别构建你的标签为每张照片和评论为每张照片,然后将它们连接在一起:

WITH tags AS (
  SELECT  photo.photo_id, json_agg(row_to_json(tag.*)) AS tags
  FROM    photo
  LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id
  LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
  GROUP BY photo.photo_id
),
comments AS (
  SELECT  photo.photo_id, json_agg(row_to_json(comment.*)) AS comments
  FROM    photo
  LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
  GROUP BY photo.photo_id
)
SELECT  COALESCE(tags.photo_id, comments.photo_id) AS photo_id,
        tags.tags,
        comments.comments
FROM    tags
FULL OUTER JOIN comments
ON      tags.photo_id = comments.photo_id

EDIT:If you really want to join everything together without CTEs, this looks like it gives correct results:

编辑:如果你真的想在没有 CTE 的情况下将所有内容连接在一起,这看起来会给出正确的结果:

SELECT  photo.photo_id,
        to_json(array_agg(DISTINCT tag.*)) AS tags,
        to_json(array_agg(DISTINCT comment.*)) AS comments
FROM    photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id

回答by Erwin Brandstetter

The cheapest and simplest DISTINCToperation is .. not to multiply rows in a "proxy cross join" in the first place. Aggregate first, thenjoin. See:

最便宜和最简单的DISTINCT操作是..首先不要在“代理交叉连接”中乘以行。先聚合,加入。看:

Best for returning few selected rows

最适合返回几个选定的行

Assumingyou actually don't want to retrieve the whole table, but just one or few selected photos at a time, with aggregated details, the most elegant and probably fastest way is with LATERALsubqueries:

假设您实际上不想检索整个表格,而是一次只检索一张或几张选定的照片,并带有汇总的详细信息,那么最优雅且可能最快的方法是使用LATERAL子查询

SELECT *
FROM   photo p
CROSS  JOIN LATERAL (
   SELECT json_agg(c) AS comments
   FROM   comment c
   WHERE  photo_id = p.photo_id
   ) c1
CROSS  JOIN LATERAL (
   SELECT json_agg(t) AS tags
   FROM   photo_tag pt
   JOIN   tag       t USING (tag_id)
   WHERE  pt.photo_id = p.photo_id
   ) t
WHERE  p.photo_id = 2;  -- arbitrary selection

This returns whole rowsfrom commentand tag, aggregated into JSON arrays separately. Rows are not multiplies like in your attempt, but they are only as "distinct" as they are in your base tables.

这将从和返回整行,分别聚合到 JSON 数组中。行与您尝试中的乘法不同,但它们仅与基表中的“不同”。commenttag

To additionally fold duplicates in the base data, see below.

要额外折叠基础数据中的重复项,请参见下文。

Notes:

笔记:

  • LATERALand json_agg()require Postgres 9.3or later.

  • json_agg(c)is short for json_agg(c.*).

  • We do not need to LEFT JOINbecause an aggregate function like json_agg()always returns a row.

  • LATERAL并且json_agg()需要 Postgres 9.3或更高版本。

  • json_agg(c)是 的缩写json_agg(c.*)

  • 我们不需要,LEFT JOIN因为像这样的聚合函数json_agg()总是返回一行。

Typically, you'd only want a subsetof columns - at leastexcluding the redundant photo_id:

通常,您只需要列的子集-至少排除冗余photo_id

SELECT *
FROM   photo p
CROSS  JOIN LATERAL (
   SELECT json_agg(json_build_object('comment_id', comment_id
                                   , 'comment', comment)) AS comments
   FROM   comment
   WHERE  photo_id = p.photo_id
   ) c
CROSS  JOIN LATERAL (
   SELECT json_agg(t) AS tags
   FROM   photo_tag pt
   JOIN   tag       t USING (tag_id)
   WHERE  pt.photo_id = p.photo_id
   ) t
WHERE  p.photo_id = 2;

json_build_object()was introduced with Postgres 9.4. Used to be cumbersome in older versions because a ROWconstructor doesn't preserve column names. But there are generic workarounds:

json_build_object()是随 Postgres 9.4引入的。过去在旧版本中很麻烦,因为ROW构造函数不保留列名。但是有一些通用的解决方法:

Also allows to choose JSON key names freely, you don't have to stick to column names.

还允许自由选择 JSON 键名,您不必拘泥于列名。

Best for returning the whole table

最适合返回整个表

To return all rows, this is more efficient:

要返回所有行,这更有效:

SELECT p.*
     , COALESCE(c1.comments, '[]') AS comments
     , COALESCE(t.tags, '[]') AS tags
FROM   photo p
LEFT   JOIN (
   SELECT photo_id
        , json_agg(json_build_object('comment_id', comment_id
                                   , 'comment', comment)) AS comments
   FROM   comment c
   GROUP  BY 1
   ) c1 USING (photo_id)
LEFT  JOIN LATERAL (
   SELECT photo_id , json_agg(t) AS tags
   FROM   photo_tag pt
   JOIN   tag       t USING (tag_id)
   GROUP  BY 1
   ) t USING (photo_id);

Once we retrieve enough rows, this gets cheaper than LATERALsubqueries. Works for Postgres 9.3+.

一旦我们检索到足够多的行,这将比LATERAL子查询便宜。适用于 Postgres 9.3+

Note the USINGclause in the join condition. This way we can conveniently use SELECT *in the outer query without getting duplicate columns for photo_id. I didn't use SELECT *here because your deleted answer indicates you want empty JSON arraysinstead of NULLfor no tags / no comments.

注意USING连接条件中的子句。这样我们就可以方便地SELECT *在外部查询中使用,而不会为photo_id. 我没有SELECT *在这里使用,因为你删除的答案表明你想要空的 JSON 数组而不是NULL没有标签/没有评论。

Also remove existing duplicates in base tables

同时删除基表中的现有重复项

You can't just json_agg(DISTINCT json_build_object(...))because there is no equality operator for the data type json. See:

您不能仅仅json_agg(DISTINCT json_build_object(...))因为数据类型没有相等运算符json。看:

There are various better ways:

有各种更好的方法:

SELECT *
FROM   photo p
CROSS  JOIN LATERAL (
   SELECT json_agg(to_json(c1.comment)) AS comments1
        , json_agg(json_build_object('comment', c1.comment)) AS comments2
        , json_agg(to_json(c1)) AS comments3
   FROM  (
      SELECT DISTINCT c.comment  -- folding dupes here
      FROM   comment c
      WHERE  c.photo_id = p.photo_id
   -- ORDER  BY comment --  any particular order?
      ) c1
   ) c2
CROSS  JOIN LATERAL (
   SELECT jsonb_agg(DISTINCT t) AS tags  -- demonstrating jsonb_agg
   FROM   photo_tag pt
   JOIN   tag       t USING (tag_id)
   WHERE  pt.photo_id = p.photo_id
   ) t
WHERE  p.photo_id = 2;

Demonstrating 4 different techniques in comments1, comments2, comments3(redundantly) and tags.

comments1comments2comments3(冗余)和 中演示 4 种不同的技术tags

db<>fiddle here
Old SQL Fiddlebackpatched to Postgres 9.3
Old SQL Fiddlefor Postgres 9.6

db<>fiddle here
Old SQL Fiddlebackpatched to Postgres 9.3
Old SQL Fiddlefor Postgres 9.6

回答by Eugene Kovalev

The most simple thing I discovered is to use DISTINCTover jsonb(not json!). (jsonb_build_objectcreates jsonb objects)

我发现的最简单的事情是使用DISTINCTover jsonb不是 json!)。(jsonb_build_object创建 jsonb 对象)

SELECT 
   JSON_AGG(
       DISTINCT jsonb_build_object('tag_id', photo_tag.tag_id, 
                                  'name', tag.name)) AS tags
FROM photo
    LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
    LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
    LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id

回答by PinnyM

As stated in comments, json_agg does not serialize a row as an object, but builds a JSON array of the values that you pass it. You'll need row_to_jsonto turn your row into a JSON object, and then json_aggto perform the aggregation to an array:

如评论中所述, json_agg 不会将行序列化为对象,而是构建您传递给它的 JSON 数组。您需要row_to_json将行转换为 JSON 对象,然后json_agg执行对数组的聚合:

SELECT json_agg(DISTINCT row_to_json(comment)) as tags
FROM
    photo
    LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
    LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
    LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id