postgresql 在 postgres 的聚合函数中 DISTINCT ON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30077639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
DISTINCT ON in an aggregate function in postgres
提问by Migwell
For my problem, we have a schema whereby one photo has many tags and also many comments. So if I have a query where I want all the comments and tags, it will multiply the rows together. So if one photo has 2 tags and 13 comments, I get 26 rows for that one photo:
对于我的问题,我们有一个架构,其中一张照片有很多标签和很多评论。因此,如果我有一个需要所有评论和标签的查询,它会将行相乘。因此,如果一张照片有 2 个标签和 13 条评论,我会为那张照片获得 26 行:
SELECT
tag.name,
comment.comment_id
FROM
photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
That's fine for most things, but it means that if I GROUP BY
and then json_agg(tag.*)
, I get 13 copies of the first tag, and 13 copies of the second tag.
这对大多数事情都没有问题,但这意味着如果 IGROUP BY
和 then json_agg(tag.*)
,我会得到第一个标签的 13 个副本,以及第二个标签的 13 个副本。
SELECT json_agg(tag.name) as tags
FROM
photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
Instead I want an array that is only 'suburban' and 'city', like this:
相反,我想要一个只有“郊区”和“城市”的数组,如下所示:
[
{"tag_id":1,"name":"suburban"},
{"tag_id":2,"name":"city"}
]
I could json_agg(DISTINCT tag.name)
, but this will only make an array of tag names, when I want the entire row as json. I would like to json_agg(DISTINCT ON(tag.name) tag.*)
, but that's not valid SQL apparently.
我可以json_agg(DISTINCT tag.name)
,但是当我想要整行作为 json 时,这只会生成一个标签名称数组。我想要json_agg(DISTINCT ON(tag.name) tag.*)
,但这显然不是有效的 SQL。
How then can I simulate DISTINCT ON
inside an aggregate function in Postgres?
那么如何DISTINCT ON
在 Postgres 中的聚合函数内部进行模拟?
回答by Paul A Jungwirth
Whenever you have a central table and want to left-join it to many rows in table A and also left-join it to many rows in table B, you get these problems of duplicating rows. It can especially throw off aggregrate functions like COUNT
and SUM
if you're not careful! So I think you need to construct your tags-for-each-photo and comments-for-each-photo separately, and then join them together:
每当您有一个中央表并希望将其左连接到表 A 中的多行,并将其左连接到表 B 中的多行时,就会遇到重复行的这些问题。如果您不小心,它尤其可以抛出聚合函数,例如COUNT
,SUM
如果您不小心!所以我认为你需要分别构建你的标签为每张照片和评论为每张照片,然后将它们连接在一起:
WITH tags AS (
SELECT photo.photo_id, json_agg(row_to_json(tag.*)) AS tags
FROM photo
LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
),
comments AS (
SELECT photo.photo_id, json_agg(row_to_json(comment.*)) AS comments
FROM photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
GROUP BY photo.photo_id
)
SELECT COALESCE(tags.photo_id, comments.photo_id) AS photo_id,
tags.tags,
comments.comments
FROM tags
FULL OUTER JOIN comments
ON tags.photo_id = comments.photo_id
EDIT:If you really want to join everything together without CTEs, this looks like it gives correct results:
编辑:如果你真的想在没有 CTE 的情况下将所有内容连接在一起,这看起来会给出正确的结果:
SELECT photo.photo_id,
to_json(array_agg(DISTINCT tag.*)) AS tags,
to_json(array_agg(DISTINCT comment.*)) AS comments
FROM photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag on photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
回答by Erwin Brandstetter
The cheapest and simplest DISTINCT
operation is .. not to multiply rows in a "proxy cross join" in the first place. Aggregate first, thenjoin. See:
最便宜和最简单的DISTINCT
操作是..首先不要在“代理交叉连接”中乘以行。先聚合,后加入。看:
Best for returning few selected rows
最适合返回几个选定的行
Assumingyou actually don't want to retrieve the whole table, but just one or few selected photos at a time, with aggregated details, the most elegant and probably fastest way is with LATERAL
subqueries:
假设您实际上不想检索整个表格,而是一次只检索一张或几张选定的照片,并带有汇总的详细信息,那么最优雅且可能最快的方法是使用LATERAL
子查询:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(c) AS comments
FROM comment c
WHERE photo_id = p.photo_id
) c1
CROSS JOIN LATERAL (
SELECT json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2; -- arbitrary selection
This returns whole rowsfrom comment
and tag
, aggregated into JSON arrays separately. Rows are not multiplies like in your attempt, but they are only as "distinct" as they are in your base tables.
这将从和返回整行,分别聚合到 JSON 数组中。行与您尝试中的乘法不同,但它们仅与基表中的“不同”。comment
tag
To additionally fold duplicates in the base data, see below.
要额外折叠基础数据中的重复项,请参见下文。
Notes:
笔记:
LATERAL
andjson_agg()
require Postgres 9.3or later.json_agg(c)
is short forjson_agg(c.*)
.We do not need to
LEFT JOIN
because an aggregate function likejson_agg()
always returns a row.
LATERAL
并且json_agg()
需要 Postgres 9.3或更高版本。json_agg(c)
是 的缩写json_agg(c.*)
。我们不需要,
LEFT JOIN
因为像这样的聚合函数json_agg()
总是返回一行。
Typically, you'd only want a subsetof columns - at leastexcluding the redundant photo_id
:
通常,您只需要列的子集-至少排除冗余photo_id
:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(json_build_object('comment_id', comment_id
, 'comment', comment)) AS comments
FROM comment
WHERE photo_id = p.photo_id
) c
CROSS JOIN LATERAL (
SELECT json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2;
json_build_object()
was introduced with Postgres 9.4. Used to be cumbersome in older versions because a ROW
constructor doesn't preserve column names. But there are generic workarounds:
json_build_object()
是随 Postgres 9.4引入的。过去在旧版本中很麻烦,因为ROW
构造函数不保留列名。但是有一些通用的解决方法:
Also allows to choose JSON key names freely, you don't have to stick to column names.
还允许自由选择 JSON 键名,您不必拘泥于列名。
Best for returning the whole table
最适合返回整个表
To return all rows, this is more efficient:
要返回所有行,这更有效:
SELECT p.*
, COALESCE(c1.comments, '[]') AS comments
, COALESCE(t.tags, '[]') AS tags
FROM photo p
LEFT JOIN (
SELECT photo_id
, json_agg(json_build_object('comment_id', comment_id
, 'comment', comment)) AS comments
FROM comment c
GROUP BY 1
) c1 USING (photo_id)
LEFT JOIN LATERAL (
SELECT photo_id , json_agg(t) AS tags
FROM photo_tag pt
JOIN tag t USING (tag_id)
GROUP BY 1
) t USING (photo_id);
Once we retrieve enough rows, this gets cheaper than LATERAL
subqueries. Works for Postgres 9.3+.
一旦我们检索到足够多的行,这将比LATERAL
子查询便宜。适用于 Postgres 9.3+。
Note the USING
clause in the join condition. This way we can conveniently use SELECT *
in the outer query without getting duplicate columns for photo_id
. I didn't use SELECT *
here because your deleted answer indicates you want empty JSON arraysinstead of NULLfor no tags / no comments.
注意USING
连接条件中的子句。这样我们就可以方便地SELECT *
在外部查询中使用,而不会为photo_id
. 我没有SELECT *
在这里使用,因为你删除的答案表明你想要空的 JSON 数组而不是NULL没有标签/没有评论。
Also remove existing duplicates in base tables
同时删除基表中的现有重复项
You can't just json_agg(DISTINCT json_build_object(...))
because there is no equality operator for the data type json
. See:
您不能仅仅json_agg(DISTINCT json_build_object(...))
因为数据类型没有相等运算符json
。看:
There are various better ways:
有各种更好的方法:
SELECT *
FROM photo p
CROSS JOIN LATERAL (
SELECT json_agg(to_json(c1.comment)) AS comments1
, json_agg(json_build_object('comment', c1.comment)) AS comments2
, json_agg(to_json(c1)) AS comments3
FROM (
SELECT DISTINCT c.comment -- folding dupes here
FROM comment c
WHERE c.photo_id = p.photo_id
-- ORDER BY comment -- any particular order?
) c1
) c2
CROSS JOIN LATERAL (
SELECT jsonb_agg(DISTINCT t) AS tags -- demonstrating jsonb_agg
FROM photo_tag pt
JOIN tag t USING (tag_id)
WHERE pt.photo_id = p.photo_id
) t
WHERE p.photo_id = 2;
Demonstrating 4 different techniques in comments1
, comments2
, comments3
(redundantly) and tags
.
在comments1
、comments2
、comments3
(冗余)和 中演示 4 种不同的技术tags
。
db<>fiddle here
Old SQL Fiddlebackpatched to Postgres 9.3
Old SQL Fiddlefor Postgres 9.6
db<>fiddle here
Old SQL Fiddlebackpatched to Postgres 9.3
Old SQL Fiddlefor Postgres 9.6
回答by Eugene Kovalev
The most simple thing I discovered is to use DISTINCT
over jsonb
(not json!).
(jsonb_build_object
creates jsonb objects)
我发现的最简单的事情是使用DISTINCT
over jsonb
(不是 json!)。(jsonb_build_object
创建 jsonb 对象)
SELECT
JSON_AGG(
DISTINCT jsonb_build_object('tag_id', photo_tag.tag_id,
'name', tag.name)) AS tags
FROM photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id
回答by PinnyM
As stated in comments, json_agg does not serialize a row as an object, but builds a JSON array of the values that you pass it. You'll need row_to_json
to turn your row into a JSON object, and then json_agg
to perform the aggregation to an array:
如评论中所述, json_agg 不会将行序列化为对象,而是构建您传递给它的值的 JSON 数组。您需要row_to_json
将行转换为 JSON 对象,然后json_agg
执行对数组的聚合:
SELECT json_agg(DISTINCT row_to_json(comment)) as tags
FROM
photo
LEFT OUTER JOIN comment ON comment.photo_id = photo.photo_id
LEFT OUTER JOIN photo_tag ON photo_tag.photo_id = photo.photo_id
LEFT OUTER JOIN tag ON photo_tag.tag_id = tag.tag_id
GROUP BY photo.photo_id