SQL 具有不同 ORDER BY 的 PostgreSQL DISTINCT ON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9795660/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL DISTINCT ON with different ORDER BY
提问by sl_bug
I want to run this query:
我想运行这个查询:
SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM purchases
WHERE purchases.product_id = 1
ORDER BY purchases.purchased_at DESC
But I get this error:
但我收到此错误:
PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
PG::Error: ERROR: SELECT DISTINCT ON 表达式必须匹配初始 ORDER BY 表达式
Adding address_id
as first ORDER BY
expression silences the error, but I really don't want to add sorting over address_id
. Is it possible to do without ordering by address_id
?
添加address_id
为第一个ORDER BY
表达式会消除错误,但我真的不想在address_id
. 可以不订购address_id
吗?
采纳答案by Mosty Mostacho
Documentation says:
文档说:
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
DISTINCT ON ( expression [, ...] ) 仅保留给定表达式计算结果相等的每组行的第一行。[...] 请注意,每个集合的“第一行”是不可预测的,除非使用 ORDER BY 来确保所需的行首先出现。[...] DISTINCT ON 表达式必须匹配最左边的 ORDER BY 表达式。
So you'll have to add the address_id
to the order by.
所以你必须将 加入address_id
到订单中。
Alternatively, if you're looking for the full row that contains the most recent purchased product for each address_id
and that result sorted by purchased_at
then you're trying to solve a greatest N per group problem which can be solved by the following approaches:
或者,如果您正在寻找包含每个最近购买的产品的完整行,address_id
并且该结果按以下排序,purchased_at
那么您正在尝试解决每组最大的 N 问题,该问题可以通过以下方法解决:
The general solution that should work in most DBMSs:
适用于大多数 DBMS 的通用解决方案:
SELECT t1.* FROM purchases t1
JOIN (
SELECT address_id, max(purchased_at) max_purchased_at
FROM purchases
WHERE product_id = 1
GROUP BY address_id
) t2
ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
ORDER BY t1.purchased_at DESC
A more PostgreSQL-oriented solution based on @hkf's answer:
基于@hkf 的回答的更面向 PostgreSQL 的解决方案:
SELECT * FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
ORDER BY address_id, purchased_at DESC
) t
ORDER BY purchased_at DESC
Problem clarified, extended and solved here: Selecting rows ordered by some column and distinct on another
问题在此处得到澄清、扩展和解决:选择按某列排序并在另一列上不同的行
回答by hkf
You can order by address_id in an subquery, then order by what you want in an outer query.
您可以在子查询中按 address_id 排序,然后在外部查询中按您想要的排序。
SELECT * FROM
(SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 1 ORDER BY address_id DESC )
ORDER BY purchased_at DESC
回答by Erwin Brandstetter
A subquerycan solve it:
一个子查询可以解决这个问题:
SELECT *
FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
) p
ORDER BY purchased_at DESC;
Leading expressions in ORDER BY
have to agree with columns in DISTINCT ON
, so you can't order by different columns in the same SELECT
.
in 中的前导表达式ORDER BY
必须与 中的列一致DISTINCT ON
,因此您不能在同一个SELECT
.
Only use an additional ORDER BY
in the subquery if you want to pick a particular row from each set:
ORDER BY
如果要从每个集合中选择特定行,请仅在子查询中使用附加项:
SELECT *
FROM (
SELECT DISTINCT ON (address_id) *
FROM purchases
WHERE product_id = 1
ORDER BY address_id, purchased_at DESC -- get "latest" row per address_id
) p
ORDER BY purchased_at DESC;
If purchased_at
can be NULL
, consider DESC NULLS LAST
. But make sure to match your index if you intend to use it. See:
如果purchased_at
可以,可以NULL
考虑DESC NULLS LAST
。但是如果您打算使用它,请确保匹配您的索引。看:
- PostgreSQL sort by datetime asc, null first?
- Why does ORDER BY NULLS LAST affect the query plan on a primary key?
Related, with more explanation:
相关,有更多解释:
回答by savenkov
Window function may solve that in one pass:
窗口函数可以一次性解决这个问题:
SELECT DISTINCT ON (address_id)
LAST_VALUE(purchases.address_id) OVER wnd AS address_id
FROM "purchases"
WHERE "purchases"."product_id" = 1
WINDOW wnd AS (
PARTITION BY address_id ORDER BY purchases.purchased_at DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
回答by reubano
For anyone using Flask-SQLAlchemy, this worked for me
对于使用 Flask-SQLAlchemy 的任何人,这对我有用
from app import db
from app.models import Purchases
from sqlalchemy.orm import aliased
from sqlalchemy import desc
stmt = Purchases.query.distinct(Purchases.address_id).subquery('purchases')
alias = aliased(Purchases, stmt)
distinct = db.session.query(alias)
distinct.order_by(desc(alias.purchased_at))
回答by vaishali
You can also done this by using group by clause
您也可以使用 group by 子句来完成此操作
SELECT purchases.address_id, purchases.* FROM "purchases"
WHERE "purchases"."product_id" = 1 GROUP BY address_id,
purchases.purchased_at ORDER purchases.purchased_at DESC