postgresql SQL:“DISTINCT ON(表达式)”有什么作用?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46566602/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL: What does "DISTINCT ON (expression)" do?
提问by JohnSmithy1266
I understand how DISTINCT
works, but I don't understand DISTINCT ON (expression)
.
我明白如何DISTINCT
运作,但我不明白DISTINCT ON (expression)
。
Take the first example from this screenshot:
以这个截图中的第一个例子为例:
How does the (a % 2)
part affect everything? Is it saying that if a % 2
evaluates to true, then return it, then continue doing so for all other tuples but only return if the returned value is distinct?
(a % 2)
零件如何影响一切?是说如果a % 2
计算结果为真,则返回它,然后对所有其他元组继续这样做,但仅在返回值不同时才返回?
回答by MatBailie
While the previous answer appears correct, I don't feel that it is particularly clear.
虽然之前的答案看起来是正确的,但我觉得它不是特别清楚。
The snippet from the Official documentationfor PostGreSQL is as follows...
PostGreSQL官方文档中的片段如下...
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
DISTINCT ON ( expression [, ...] ) 仅保留给定表达式计算结果相等的每组行的第一行。[...] 请注意,每个集合的“第一行”是不可预测的,除非使用 ORDER BY 来确保所需的行首先出现。[...] DISTINCT ON 表达式必须匹配最左边的 ORDER BY 表达式。
The first point is that whatever you put in the ON ()
, must come first in the the ORDER BY
, for reasons that will hopefully shortly become clear...
第一点是,无论您放入ON ()
, 都必须首先放在 中ORDER BY
,原因很快就会变得清晰......
SELECT DISTINCT ON (a) a, b, c FROM a_table ORDER BY a, b
The results are then filtered, so that for each of the distinct entities, only the first rowis actually returned.
然后过滤结果,因此对于每个不同的实体,实际上只返回第一行。
For example...
例如...
CREATE TABLE example (
id INT,
person_id INT,
address_id INT,
effective_date DATE
);
INSERT INTO
example (id, person_id, address_id, effective_date)
VALUES
(1, 2, 1, '2000-01-01'), -- Moved to first house
(5, 2, 2, '2004-08-19'), -- Went to uni
(9, 2, 1, '2007-06-12'), -- Moved back home
(2, 4, 3, '2007-05-18'), -- Moved to first house
(3, 4, 4, '2016-02-09') -- Moved to new house
;
SELECT DISTINCT ON (person_id)
*
FROM
example
ORDER BY
person_id,
effective_date DESC
;
This will order the results so that all the records for each person are contiguous, ordered from the most recent record to the oldest. Then, for each person, on the first record is returned. Thus, giving the most recent address for each person.
这将对结果进行排序,以便每个人的所有记录都是连续的,从最近的记录到最旧的记录排序。然后,对于每个人,返回第一条记录。因此,为每个人提供最近的地址。
Step 1 : Apply the ORDER BY...
id | person_id | address_id | effective_date
----+-----------+------------+----------------
9 | 2 | 1 | '2007-06-12'
5 | 2 | 2 | '2004-08-19'
1 | 2 | 1 | '2000-01-01'
3 | 4 | 4 | '2016-02-09'
2 | 4 | 3 | '2007-05-18'
Step 2 : filter to just the first row per person_id
id | person_id | address_id | effective_date
----+-----------+------------+----------------
9 | 2 | 1 | '2007-06-12'
3 | 4 | 4 | '2016-02-09'
It is broadly equivalent to this following...
它大致相当于以下...
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY person_id
ORDER BY effective_date DESC) AS person_address_ordinal
FROM
example
)
AS sorted_example
WHERE
person_address_ordinal = 1
As for the question about what (a % 2)
does, it's just a mathematical calculation for MOD(a, 2)
, so you could do the following...
至于关于什么的问题(a % 2)
,这只是 的数学计算MOD(a, 2)
,因此您可以执行以下操作...
CREATE TABLE example (
id INT,
score INT
);
INSERT INTO
example (id, score)
VALUES
(1, 2),
(2, 6),
(3, 5),
(4, 3),
(5, 4),
;
SELECT DISTINCT ON (id % 2)
*
FROM
example
ORDER BY
id % 2,
score DESC
;
That would give the highest score for the even id
s (where id % 2
equals 0
), then the highest score the odd id
s (where id % 2
equals 1
).
这将给出偶数id
s (其中id % 2
等于0
)的最高分,然后是奇数id
s (其中id % 2
等于1
)的最高分。
Step 1 : Apply the ORDER BY...
id | score
----+-------
2 | 6 -- id % 2 = 0
4 | 3 -- id % 2 = 0
3 | 5 -- id % 2 = 1
5 | 4 -- id % 2 = 1
1 | 2 -- id % 2 = 1
Step 2 : filter to just the first row per `id % 2`
id | score
----+-------
2 | 6 -- id % 2 = 0
3 | 5 -- id % 2 = 1
回答by Lukasz Szozda
a % 2 is modulo operator. You could get only 0 or 1 (NULL
if column is nullable).
% 2 是模运算符。您只能得到 0 或 1(NULL
如果列可以为空)。
For example:
例如:
i | a | a%2
1 10 0
2 11 1
3 12 0
4 13 0
Code:
代码:
CREATE TABLE r(i INT, a INT);
INSERT INTO r(i, a) VALUES (1,10), (2,11),(3,12),(4,13);
SELECT DISTINCT ON (a%2) a
FROM r;
Output:
输出:
10
11
SELECT DISTINCT ON (a%2) a
FROM r
ORDER BY a%2,i DESC;
Output:
输出:
12
13