postgresql SQL：“DISTINCT ON（表达式）”有什么作用？

Question

提问by JohnSmithy1266

I understand how DISTINCTworks, but I don't understand DISTINCT ON (expression).

我明白如何DISTINCT运作，但我不明白DISTINCT ON (expression)。

Take the first example from this screenshot:

以这个截图中的第一个例子为例：

How does the (a % 2)part affect everything? Is it saying that if a % 2evaluates to true, then return it, then continue doing so for all other tuples but only return if the returned value is distinct?

(a % 2)零件如何影响一切？是说如果a % 2计算结果为真，则返回它，然后对所有其他元组继续这样做，但仅在返回值不同时才返回？

Answer 1

回答by MatBailie

While the previous answer appears correct, I don't feel that it is particularly clear.

虽然之前的答案看起来是正确的，但我觉得它不是特别清楚。

The snippet from the Official documentationfor PostGreSQL is as follows...

PostGreSQL官方文档中的片段如下...

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

DISTINCT ON ( expression [, ...] ) 仅保留给定表达式计算结果相等的每组行的第一行。[...] 请注意，每个集合的“第一行”是不可预测的，除非使用 ORDER BY 来确保所需的行首先出现。[...] DISTINCT ON 表达式必须匹配最左边的 ORDER BY 表达式。

The first point is that whatever you put in the ON (), must come first in the the ORDER BY, for reasons that will hopefully shortly become clear...

第一点是，无论您放入ON (), 都必须首先放在中ORDER BY，原因很快就会变得清晰......

SELECT DISTINCT ON (a) a, b, c FROM a_table ORDER BY a, b

The results are then filtered, so that for each of the distinct entities, only the first rowis actually returned.

然后过滤结果，因此对于每个不同的实体，实际上只返回第一行。

For example...

例如...

CREATE TABLE example (
    id               INT,
    person_id        INT,
    address_id       INT,
    effective_date   DATE
);

INSERT INTO
    example (id, person_id, address_id, effective_date)
VALUES
    (1, 2, 1, '2000-01-01'),  -- Moved to first house
    (5, 2, 2, '2004-08-19'),  -- Went to uni
    (9, 2, 1, '2007-06-12'),  -- Moved back home

    (2, 4, 3, '2007-05-18'),  -- Moved to first house
    (3, 4, 4, '2016-02-09')   -- Moved to new house
;

SELECT DISTINCT ON (person_id)
    *
FROM
    example
ORDER BY
    person_id,
    effective_date DESC
;

This will order the results so that all the records for each person are contiguous, ordered from the most recent record to the oldest. Then, for each person, on the first record is returned. Thus, giving the most recent address for each person.

这将对结果进行排序，以便每个人的所有记录都是连续的，从最近的记录到最旧的记录排序。然后，对于每个人，返回第一条记录。因此，为每个人提供最近的地址。

Step 1 : Apply the ORDER BY...

 id | person_id | address_id | effective_date
----+-----------+------------+----------------
  9 |      2    |      1     |  '2007-06-12'
  5 |      2    |      2     |  '2004-08-19'
  1 |      2    |      1     |  '2000-01-01'
  3 |      4    |      4     |  '2016-02-09'
  2 |      4    |      3     |  '2007-05-18'

Step 2 : filter to just the first row per person_id

 id | person_id | address_id | effective_date
----+-----------+------------+----------------
  9 |      2    |      1     |  '2007-06-12'
  3 |      4    |      4     |  '2016-02-09'

It is broadly equivalent to this following...

它大致相当于以下...

SELECT
    *
FROM
(
    SELECT
        *,
        ROW_NUMBER() OVER (PARTITION BY person_id
                               ORDER BY effective_date DESC)  AS person_address_ordinal
    FROM
        example
)
    AS sorted_example
WHERE
    person_address_ordinal = 1

As for the question about what (a % 2)does, it's just a mathematical calculation for MOD(a, 2), so you could do the following...

至于关于什么的问题(a % 2)，这只是的数学计算MOD(a, 2)，因此您可以执行以下操作...

CREATE TABLE example (
    id               INT,
    score            INT
);

INSERT INTO
    example (id, score)
VALUES
    (1, 2),
    (2, 6),
    (3, 5),
    (4, 3),
    (5, 4),
;

SELECT DISTINCT ON (id % 2)
    *
FROM
    example
ORDER BY
    id % 2,
    score DESC
;

That would give the highest score for the even ids (where id % 2equals 0), then the highest score the odd ids (where id % 2equals 1).

这将给出偶数ids （其中id % 2等于0）的最高分，然后是奇数ids （其中id % 2等于1）的最高分。

Step 1 : Apply the ORDER BY...

 id | score
----+-------

  2 |   6     -- id % 2 = 0
  4 |   3     -- id % 2 = 0

  3 |   5     -- id % 2 = 1
  5 |   4     -- id % 2 = 1
  1 |   2     -- id % 2 = 1

Step 2 : filter to just the first row per `id % 2`

 id | score
----+-------
  2 |   6     -- id % 2 = 0
  3 |   5     -- id % 2 = 1

Answer 2

回答by Lukasz Szozda

a % 2 is modulo operator. You could get only 0 or 1 (NULLif column is nullable).

% 2 是模运算符。您只能得到 0 或 1（NULL如果列可以为空）。

For example:

例如：

 i    |   a   | a%2
 1        10     0
 2        11     1
 3        12     0 
 4        13     0

Code:

代码：

CREATE TABLE r(i INT, a INT);
INSERT INTO r(i, a)  VALUES (1,10), (2,11),(3,12),(4,13);

SELECT DISTINCT ON (a%2) a
FROM r;

Output:

输出：

10
11

SELECT DISTINCT ON (a%2) a
FROM r
ORDER BY a%2,i DESC;

Output:

输出：

12
13

Rextester Demo

雷克斯特演示

postgresql SQL：“DISTINCT ON（表达式）”有什么作用？

提问by JohnSmithy1266

回答by MatBailie

回答by Lukasz Szozda

相关推荐

最近更新

标签

postgresql SQL：“DISTINCT ON（表达式）”有什么作用？

提问by JohnSmithy1266

回答by MatBailie

回答by Lukasz Szozda

相关推荐

postgresql typeorm 中的 Postgres 枚举

postgresql pg_dump：由于服务器版本不匹配而中止

如何将 Power BI 连接到 PostgreSQL

postgresql 如何在 psql 控制台中打印当前工作目录？

相关推荐

最近更新

标签