SQL 选择每个 GROUP BY 组中的第一行？

Question

提问by David Wolever

As the title suggests, I'd like to select the first row of each set of rows grouped with a GROUP BY.

正如标题所暗示的那样，我想选择每组以GROUP BY.

Specifically, if I've got a purchasestable that looks like this:

具体来说，如果我有一个purchases看起来像这样的表：

SELECT * FROM purchases;

My Output:

我的输出：

id | customer | total
---+----------+------
 1 | Joe      | 5
 2 | Sally    | 3
 3 | Joe      | 2
 4 | Sally    | 1

I'd like to query for the idof the largest purchase (total) made by each customer. Something like this:

我想查询的id最大购买（的total）每个制造customer。像这样的东西：

SELECT FIRST(id), customer, FIRST(total)
FROM  purchases
GROUP BY customer
ORDER BY total DESC;

Expected Output:

预期输出：

FIRST(id) | customer | FIRST(total)
----------+----------+-------------
        1 | Joe      | 5
        2 | Sally    | 3

Answer 1

采纳答案by OMG Ponies

On Oracle 9.2+ (not 8i+ as originally stated), SQL Server 2005+, PostgreSQL 8.4+, DB2, Firebird 3.0+, Teradata, Sybase, Vertica:

在 Oracle 9.2+（不是最初声明的 8i+）、SQL Server 2005+、PostgreSQL 8.4+、DB2、Firebird 3.0+、Teradata、Sybase、Vertica 上：

WITH summary AS (
    SELECT p.id, 
           p.customer, 
           p.total, 
           ROW_NUMBER() OVER(PARTITION BY p.customer 
                                 ORDER BY p.total DESC) AS rk
      FROM PURCHASES p)
SELECT s.*
  FROM summary s
 WHERE s.rk = 1

Supported by any database:

任何数据库都支持：

But you need to add logic to break ties:

但是你需要添加逻辑来打破关系：

  SELECT MIN(x.id),  -- change to MAX if you want the highest
         x.customer, 
         x.total
    FROM PURCHASES x
    JOIN (SELECT p.customer,
                 MAX(total) AS max_total
            FROM PURCHASES p
        GROUP BY p.customer) y ON y.customer = x.customer
                              AND y.max_total = x.total
GROUP BY x.customer, x.total

Answer 2

回答by Erwin Brandstetter

In PostgreSQLthis is typically simpler and faster(more performance optimization below):

在PostgreSQL 中，这通常更简单、更快（下面有更多的性能优化）：

SELECT DISTINCT ON (customer)
       id, customer, total
FROM   purchases
ORDER  BY customer, total DESC, id;

Or shorter (if not as clear) with ordinal numbers of output columns:

或者更短（如果不是很清楚）输出列的序号：

SELECT DISTINCT ON (2)
       id, customer, total
FROM   purchases
ORDER  BY 2, 3 DESC, 1;

If totalcan be NULL (won't hurt either way, but you'll want to match existing indexes):

Iftotal可以是 NULL （不会有任何伤害，但你会想要匹配现有的索引）：

...
ORDER  BY customer, total DESC NULLS LAST, id;

Major points

要点

DISTINCT ONis a PostgreSQL extension of the standard (where only DISTINCTon the whole SELECTlist is defined).
List any number of expressions in the DISTINCT ONclause, the combined row value defines duplicates. The manual:
Obviously, two rows are considered distinct if they differ in at least one column value. Null values are considered equal in this comparison.
Bold emphasis mine.
DISTINCT ONcan be combined with ORDER BY. Leading expressions in ORDER BYmust be in the set of expressions in DISTINCT ON, but you can rearrange order among those freely. Example.You can add additionalexpressions to ORDER BYto pick a particular row from each group of peers. Or, as the manual puts it:
The DISTINCT ONexpression(s) must match the leftmost ORDER BYexpression(s). The ORDER BYclause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ONgroup.
I added idas last item to break ties:
"Pick the row with the smallest idfrom each group sharing the highest total."
To order results in a way that disagrees with the sort order determining the first per group, you can nest above query in an outer query with another ORDER BY. Example.
If totalcan be NULL, you most probablywant the row with the greatest non-null value. Add NULLS LASTlike demonstrated. See:
- PostgreSQL sort by datetime asc, null first?
The SELECTlistis not constrained by expressions in DISTINCT ONor ORDER BYin any way. (Not needed in the simple case above):
- You don't have toinclude any of the expressions in DISTINCT ONor ORDER BY.
- You caninclude any other expression in the SELECTlist. This is instrumental for replacing much more complex queries with subqueries and aggregate / window functions.
I tested with Postgres versions 8.3 – 12. But the feature has been there at least since version 7.1, so basically always.

DISTINCT ON是标准的 PostgreSQL 扩展（仅定义DISTINCT了整个SELECT列表）。
列出DISTINCT ON子句中任意数量的表达式，组合的行值定义重复项。手册：
显然，如果两行至少有一个列值不同，则它们被认为是不同的。在此比较中，空值被视为相等。
大胆强调我的。
DISTINCT ON可以与ORDER BY. in 中的前导表达式ORDER BY必须在中的表达式集中DISTINCT ON，但您可以自由地重新排列它们之间的顺序。例子。您可以添加其他表达式以ORDER BY从每组对等项中选择特定行。或者，正如手册所说：
的DISTINCT ON表达式（一个或多个）必须最左边的匹配ORDER BY表达式（一个或多个）。该ORDER BY子句通常包含附加表达式，用于确定每个DISTINCT ON组中行的所需优先级。
我添加了id作为打破平局的最后一项：
“id从每个组中选择最小的行共享最高的total。”
要以与确定每组第一个的排序顺序不一致的方式对结果进行排序，您可以将上面的查询嵌套在外部查询中与另一个ORDER BY. 例子。
如果total可以为 NULL，则您很可能想要具有最大非空值的行。添加NULLS LAST如演示。看：
- PostgreSQL 按日期时间 asc 排序，先为空？
该SELECT名单不是由表达式约束DISTINCT ON或ORDER BY以任何方式。（在上面的简单情况下不需要）：
- 您不必在DISTINCT ON或中包含任何表达式ORDER BY。
- 您可以在SELECT列表中包含任何其他表达式。这有助于用子查询和聚合/窗口函数替换更复杂的查询。
我使用 Postgres 8.3 – 12 版进行了测试。但该功能至少从 7.1 版开始就已经存在，所以基本上总是如此。

Index

指数

The perfectindex for the above query would be a multi-column indexspanning all three columns in matching sequence and with matching sort order:

上述查询的完美索引应该是一个多列索引，它以匹配的顺序和匹配的排序顺序跨越所有三列：

CREATE INDEX purchases_3c_idx ON purchases (customer, total DESC, id);

May be too specialized. But use it if read performance for the particular query is crucial. If you have DESC NULLS LASTin the query, use the same in the index so that sort order matches and the index is applicable.

可能太专业了。但是如果特定查询的读取性能至关重要，请使用它。如果DESC NULLS LAST在查询中有，请在索引中使用相同的内容，以便排序顺序匹配并且索引适用。

Effectiveness / Performance optimization

有效性/性能优化

Weigh cost and benefit before creating tailored indexes for each query. The potential of above index largely depends on data distribution.

在为每个查询创建定制索引之前权衡成本和收益。上述指标的潜力很大程度上取决于数据分布。

The index is used because it delivers pre-sorted data. In Postgres 9.2 or later the query can also benefit from an index only scanif the index is smaller than the underlying table. The index has to be scanned in its entirety, though.

使用索引是因为它提供预先排序的数据。在 Postgres 9.2 或更高版本中，如果索引小于基础表，查询也可以从仅索引扫描中受益。但是，必须完整扫描索引。

For fewrows per customer(high cardinality in column customer), this is very efficient. Even more so if you need sorted output anyway. The benefit shrinks with a growing number of rows per customer.
Ideally, you have enough work_memto process the involved sort step in RAM and not spill to disk. But generally setting work_memtoohigh can have adverse effects. Consider SET LOCALfor exceptionally big queries. Find how much you need with EXPLAIN ANALYZE. Mention of "Disk:" in the sort step indicates the need for more:
- Configuration parameter work_mem in PostgreSQL on Linux
- Optimize simple query using ORDER BY date and text
For manyrows per customer(low cardinality in column customer), a loose index scan(a.k.a. "skip scan") would be (much) more efficient, but that's not implemented up to Postgres 12. (An implementation for index-only scans is in development for Postgres 13. See hereand here.)
For now, there are faster query techniquesto substitute for this. In particular if you have a separate table holding unique customers, which is the typical use case. But also if you don't:

对于每个客户的几行（列中的高基数customer），这是非常有效的。如果您无论如何都需要排序输出，则更是如此。随着每位客户的行数越来越多，收益会减少。
理想情况下，您有足够的work_mem时间在 RAM 中处理所涉及的排序步骤，而不会溢出到磁盘。但通常设置work_mem太高会产生不利影响。考虑SET LOCAL特别大的查询。用找到你需要多少EXPLAIN ANALYZE。在排序步骤中提到“磁盘：”表示需要更多：
- Linux 上 PostgreSQL 中的配置参数 work_mem
- 使用 ORDER BY 日期和文本优化简单查询
对于每个客户的多行（列中的基数低customer），松散索引扫描（又名“跳过扫描”）会（更）高效，但直到 Postgres 12 为止还没有实现。（仅索引扫描的实现在Postgres 13 的开发。请参阅此处和此处。）
目前，有更快的查询技术来替代它。特别是如果您有一个单独的表保存唯一客户，这是典型的用例。但如果你不这样做：

Benchmark

基准

I had a simple benchmark here which is outdated by now. I replaced it with a detailed benchmark in this separate answer.

我在这里有一个简单的基准测试，但现在已经过时了。我在这个单独的答案中用详细的基准替换了它。

Answer 3

回答by Erwin Brandstetter

Benchmark

基准

Testing the most interesting candidates with Postgres 9.4and 9.5with a halfway realistic table of 200k rowsin purchasesand 10k distinct customer_id(avg. 20 rows per customer).

测试中最有趣的候选人的Postgres 9.4和9.5用的中途现实表20万行中purchases和10K不同customer_id（平均每用户20行）。

For Postgres 9.5 I ran a 2nd test with effectively 86446 distinct customers. See below (avg. 2.3 rows per customer).

对于 Postgres 9.5，我对 86446 个不同的客户进行了第二次测试。见下文（每位客户平均 2.3 行）。

Setup

设置

Main table

主桌

CREATE TABLE purchases (
  id          serial
, customer_id int  -- REFERENCES customer
, total       int  -- could be amount of money in Cent
, some_column text -- to make the row bigger, more realistic
);

I use a serial(PK constraint added below) and an integer customer_idsince that's a more typical setup. Also added some_columnto make up for typically more columns.

我使用一个serial（下面添加了PK 约束）和一个整数，customer_id因为这是一个更典型的设置。还添加some_column以弥补通常更多的列。

Dummy data, PK, index - a typical table also has some dead tuples:

虚拟数据、PK、索引——一个典型的表也有一些死元组：

INSERT INTO purchases (customer_id, total, some_column)    -- insert 200k rows
SELECT (random() * 10000)::int             AS customer_id  -- 10k customers
     , (random() * random() * 100000)::int AS total     
     , 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM   generate_series(1,200000) g;

ALTER TABLE purchases ADD CONSTRAINT purchases_id_pkey PRIMARY KEY (id);

DELETE FROM purchases WHERE random() > 0.9; -- some dead rows

INSERT INTO purchases (customer_id, total, some_column)
SELECT (random() * 10000)::int             AS customer_id  -- 10k customers
     , (random() * random() * 100000)::int AS total     
     , 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM   generate_series(1,20000) g;  -- add 20k to make it ~ 200k

CREATE INDEX purchases_3c_idx ON purchases (customer_id, total DESC, id);

VACUUM ANALYZE purchases;

customertable - for superior query

customer表 - 用于高级查询

CREATE TABLE customer AS
SELECT customer_id, 'customer_' || customer_id AS customer
FROM   purchases
GROUP  BY 1
ORDER  BY 1;

ALTER TABLE customer ADD CONSTRAINT customer_customer_id_pkey PRIMARY KEY (customer_id);

VACUUM ANALYZE customer;

In my second testfor 9.5 I used the same setup, but with random() * 100000to generate customer_idto get only few rows per customer_id.

在我对 9.5 的第二次测试中，我使用了相同的设置，但使用random() * 100000生成customer_id每个customer_id.

Object sizes for table `purchases`

表的对象大小 `purchases`

Generated with this query.

使用此查询生成。

               what                | bytes/ct | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
 core_relation_size                | 20496384 | 20 MB        |           102
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |    24576 | 24 kB        |             0
 table_size_incl_toast             | 20529152 | 20 MB        |           102
 indexes_size                      | 10977280 | 10 MB        |            54
 total_size_incl_toast_and_indexes | 31506432 | 30 MB        |           157
 live_rows_in_text_representation  | 13729802 | 13 MB        |            68
 ------------------------------    |          |              |
 row_count                         |   200045 |              |
 live_tuples                       |   200045 |              |
 dead_tuples                       |    19955 |              |

Queries

查询

1. `row_number()`in CTE, (see other answer)

1.`row_number()`在CTE中，（见其他答案）

WITH cte AS (
   SELECT id, customer_id, total
        , row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
   FROM   purchases
   )
SELECT id, customer_id, total
FROM   cte
WHERE  rn = 1;

2. `row_number()`in subquery (my optimization)

2.`row_number()`在子查询中（我的优化）

SELECT id, customer_id, total
FROM   (
   SELECT id, customer_id, total
        , row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
   FROM   purchases
   ) sub
WHERE  rn = 1;

3. `DISTINCT ON`(see other answer)

3. `DISTINCT ON`（见其他答案）

SELECT DISTINCT ON (customer_id)
       id, customer_id, total
FROM   purchases
ORDER  BY customer_id, total DESC, id;

4. rCTE with `LATERAL`subquery (see here)

4. 带有`LATERAL`子查询的rCTE （见这里）

WITH RECURSIVE cte AS (
   (  -- parentheses required
   SELECT id, customer_id, total
   FROM   purchases
   ORDER  BY customer_id, total DESC
   LIMIT  1
   )
   UNION ALL
   SELECT u.*
   FROM   cte c
   ,      LATERAL (
      SELECT id, customer_id, total
      FROM   purchases
      WHERE  customer_id > c.customer_id  -- lateral reference
      ORDER  BY customer_id, total DESC
      LIMIT  1
      ) u
   )
SELECT id, customer_id, total
FROM   cte
ORDER  BY customer_id;

5. `customer`table with `LATERAL`(see here)

5.`customer`表带`LATERAL`（见这里）

SELECT l.*
FROM   customer c
,      LATERAL (
   SELECT id, customer_id, total
   FROM   purchases
   WHERE  customer_id = c.customer_id  -- lateral reference
   ORDER  BY total DESC
   LIMIT  1
   ) l;

6. `array_agg()`with `ORDER BY`(see other answer)

6.`array_agg()`与`ORDER BY`（见其他答案）

SELECT (array_agg(id ORDER BY total DESC))[1] AS id
     , customer_id
     , max(total) AS total
FROM   purchases
GROUP  BY customer_id;

Results

结果

Execution time for above queries with EXPLAIN ANALYZE(and all options off), best of 5 runs.

上述查询的执行时间EXPLAIN ANALYZE（和所有选项都关闭），最好的 5 次运行。

Allqueries used an Index Only Scanon purchases2_3c_idx(among other steps). Some of them just for the smaller size of the index, others more effectively.

所有查询使用的索引只扫描上purchases2_3c_idx（其他步骤中）。其中一些仅适用于较小的索引大小，而另一些则更有效。

A. Postgres 9.4 with 200k rows and ~ 20 per `customer_id`

A. Postgres 9.4 有 20 万行，每行 ~ 20 `customer_id`

1. 273.274 ms  
2. 194.572 ms  
3. 111.067 ms  
4.  92.922 ms  
5.  37.679 ms  -- winner
6. 189.495 ms

B. The same with Postgres 9.5

B. 与 Postgres 9.5 相同

1. 288.006 ms
2. 223.032 ms  
3. 107.074 ms  
4.  78.032 ms  
5.  33.944 ms  -- winner
6. 211.540 ms

C. Same as B., but with ~ 2.3 rows per `customer_id`

C. 与 B. 相同，但每个约 2.3 行 `customer_id`

1. 381.573 ms
2. 311.976 ms
3. 124.074 ms  -- winner
4. 710.631 ms
5. 311.976 ms
6. 421.679 ms

Related benchmarks

回答by TMS

This is common greatest-n-per-groupproblem, which already has well tested and highly optimized solutions. Personally I prefer the left join solution by Bill Karwin(the original post with lots of other solutions).

这是常见的每组最大 n 个问题，已经有经过充分测试和高度优化的解决方案。就我个人而言，我更喜欢Bill Karwin的left join 解决方案（包含许多其他解决方案的原始帖子）。

Note that bunch of solutions to this common problem can surprisingly be found in the one of most official sources, MySQL manual! See Examples of Common Queries :: The Rows Holding the Group-wise Maximum of a Certain Column.

请注意，在最官方的来源之一MySQL 手册中可以令人惊讶地找到针对此常见问题的大量解决方案！请参阅常见查询示例:: 保留特定列的分组最大值的行。

Answer 5

回答by Paul A Jungwirth

In Postgres you can use array_agglike this:

在 Postgres 中，您可以这样使用array_agg：

SELECT  customer,
        (array_agg(id ORDER BY total DESC))[1],
        max(total)
FROM purchases
GROUP BY customer

This will give you the idof each customer's largest purchase.

这将为您id提供每个客户的最大购买量。

Some things to note:

一些注意事项：

array_aggis an aggregate function, so it works with GROUP BY.
array_agglets you specify an ordering scoped to just itself, so it doesn't constrain the structure of the whole query. There is also syntax for how you sort NULLs, if you need to do something different from the default.
Once we build the array, we take the first element. (Postgres arrays are 1-indexed, not 0-indexed).
You could use array_aggin a similar way for your third output column, but max(total)is simpler.
Unlike DISTINCT ON, using array_agglets you keep your GROUP BY, in case you want that for other reasons.

array_agg是一个聚合函数，所以它适用于GROUP BY.
array_agg允许您指定范围仅限于自身的排序，因此它不会限制整个查询的结构。如果您需要做一些与默认值不同的事情，还有关于如何对 NULL 进行排序的语法。
一旦我们构建了数组，我们就取第一个元素。（Postgres 数组是 1 索引的，而不是 0 索引的）。
您可以array_agg以类似的方式使用第三个输出列，但max(total)更简单。
与不同DISTINCT ON， usingarray_agg可以让您保留您的GROUP BY，以防您出于其他原因需要。

Answer 6

回答by user2407394

The solution is not very efficient as pointed by Erwin, because of presence of SubQs

正如 Erwin 所指出的，该解决方案不是很有效，因为存在 SubQ

select * from purchases p1 where total in
(select max(total) from purchases where p1.customer=customer) order by total desc;

Answer 7

回答by matiu

I use this way (postgresql only): https://wiki.postgresql.org/wiki/First/last_%28aggregate%29

我使用这种方式（仅 postgresql）：https: //wiki.postgresql.org/wiki/First/last_%28aggregate%29

-- Create a function that always returns the first non-NULL item
CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
        SELECT ;
$$;

-- And then wrap an aggregate around it
CREATE AGGREGATE public.first (
        sfunc    = public.first_agg,
        basetype = anyelement,
        stype    = anyelement
);

-- Create a function that always returns the last non-NULL item
CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
        SELECT ;
$$;

-- And then wrap an aggregate around it
CREATE AGGREGATE public.last (
        sfunc    = public.last_agg,
        basetype = anyelement,
        stype    = anyelement
);

Then your example should work almostas is:

那么您的示例应该几乎按原样工作：

SELECT FIRST(id), customer, FIRST(total)
FROM  purchases
GROUP BY customer
ORDER BY FIRST(total) DESC;

CAVEAT: It ignore's NULL rows

CAVEAT：它忽略 NULL 行

Edit 1 - Use the postgres extension instead

编辑 1 - 改用 postgres 扩展

Now I use this way: http://pgxn.org/dist/first_last_agg/

现在我用这种方式：http: //pgxn.org/dist/first_last_agg/

To install on ubuntu 14.04:

在 ubuntu 14.04 上安装：

apt-get install postgresql-server-dev-9.3 git build-essential -y
git clone git://github.com/wulczer/first_last_agg.git
cd first_last_app
make && sudo make install
psql -c 'create extension first_last_agg'

It's a postgres extension that gives you first and last functions; apparently faster than the above way.

它是一个 postgres 扩展，为您提供 first 和 last 功能；显然比上述方式更快。

Edit 2 - Ordering and filtering

编辑 2 - 排序和过滤

If you use aggregate functions (like these), you can order the results, without the need to have the data already ordered:

如果您使用聚合函数（像这些），您可以对结果进行排序，而无需对数据进行排序：

http://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES

So the equivalent example, with ordering would be something like:

所以等效的例子，排序将是这样的：

SELECT first(id order by id), customer, first(total order by id)
  FROM purchases
 GROUP BY customer
 ORDER BY first(total);

Of course you can order and filter as you deem fit within the aggregate; it's very powerful syntax.

当然，您可以按照您认为适合的方式进行排序和过滤；这是非常强大的语法。

Answer 8

回答by khaled_gomaa

The Query:

查询：

SELECT purchases.*
FROM purchases
LEFT JOIN purchases as p 
ON 
  p.customer = purchases.customer 
  AND 
  purchases.total < p.total
WHERE p.total IS NULL

HOW DOES THAT WORK!(I've been there)

这是如何运作的！（我去过那儿）

We want to make sure that we only have the highest total for each purchase.

我们希望确保每次购买的总金额最高。

Some Theoretical Stuff(skip this part if you only want to understand the query)

一些理论知识（如果您只想了解查询，请跳过此部分）

Let Total be a function T(customer,id) where it returns a value given the name and id To prove that the given total (T(customer,id)) is the highest we have to prove that We want to prove either

设 Total 是一个函数 T(customer,id)，它返回一个给定名称和 id 的值为了证明给定的总数 (T(customer,id)) 是我们必须证明的最高值我们想要证明

?x T(customer,id) > T(customer,x) (this total is higher than all other total for that customer)

?x T(customer,id) > T(customer,x)（此总数高于该客户的所有其他总数）

OR

或者

??x T(customer, id) < T(customer, x) (there exists no higher total for that customer)

??x T(customer, id) < T(customer, x)（该客户不存在更高的总数）

The first approach will need us to get all the records for that name which I do not really like.

第一种方法需要我们获取我不喜欢的那个名字的所有记录。

The second one will need a smart way to say there can be no record higher than this one.

第二个需要一种聪明的方式来说明没有比这个更高的记录。

Back to SQL

回到 SQL

If we left joins the table on the name and total being less than the joined table:

如果我们在名称上留下连接表，并且总数小于连接表：

      LEFT JOIN purchases as p 
      ON 
      p.customer = purchases.customer 
      AND 
      purchases.total < p.total

we make sure that all records that have another record with the higher total for the same user to be joined:

我们确保所有具有另一条记录且同一用户的总数较高的记录都被加入：

purchases.id, purchases.customer, purchases.total, p.id, p.customer, p.total
1           , Tom           , 200             , 2   , Tom   , 300
2           , Tom           , 300
3           , Bob           , 400             , 4   , Bob   , 500
4           , Bob           , 500
5           , Alice         , 600             , 6   , Alice   , 700
6           , Alice         , 700

That will help us filter for the highest total for each purchase with no grouping needed:

这将帮助我们过滤每次购买的最高总数，而无需分组：

WHERE p.total IS NULL

purchases.id, purchases.name, purchases.total, p.id, p.name, p.total
2           , Tom           , 300
4           , Bob           , 500
6           , Alice         , 700

And that's the answer we need.

这就是我们需要的答案。

Answer 9

回答by Alejandro Salamanca Mazuelo

Very fast solution

非常快速的解决方案

SELECT a.* 
FROM
    purchases a 
    JOIN ( 
        SELECT customer, min( id ) as id 
        FROM purchases 
        GROUP BY customer 
    ) b USING ( id );

and really very fast if table is indexed by id:

如果表按 id 索引，速度真的非常快：

create index purchases_id on purchases (id);

Answer 10

回答by Diwas Poudel

In SQL Server you can do this:

在 SQL Server 中，您可以这样做：

SELECT *
FROM (
SELECT ROW_NUMBER()
OVER(PARTITION BY customer
ORDER BY total DESC) AS StRank, *
FROM Purchases) n
WHERE StRank = 1

Explaination:Here Group byis done on the basis of customer and then order it by total then each such group is given serial number as StRank and we are taking out first 1 customer whose StRank is 1

说明：这里按客户分组，然后按总数排序，然后每个这样的组被赋予序列号作为StRank，我们取出第一个StRank为1的客户

SQL 选择每个 GROUP BY 组中的第一行？

提问by David Wolever

采纳答案by OMG Ponies

On Oracle 9.2+ (not 8i+ as originally stated), SQL Server 2005+, PostgreSQL 8.4+, DB2, Firebird 3.0+, Teradata, Sybase, Vertica:

在 Oracle 9.2+（不是最初声明的 8i+）、SQL Server 2005+、PostgreSQL 8.4+、DB2、Firebird 3.0+、Teradata、Sybase、Vertica 上：

Supported by any database:

任何数据库都支持：

回答by Erwin Brandstetter

Major points

要点

Index

指数

Effectiveness / Performance optimization

有效性/性能优化

Benchmark

基准

回答by Erwin Brandstetter

Benchmark

基准

Setup

设置

Object sizes for table purchases

表的对象大小 purchases

Queries

查询

1. row_number()in CTE, (see other answer)

1.row_number()在CTE中，（见其他答案）

2. row_number()in subquery (my optimization)

2.row_number()在子查询中（我的优化）

3. DISTINCT ON(see other answer)

3. DISTINCT ON（见其他答案）

4. rCTE with LATERALsubquery (see here)

4. 带有LATERAL子查询的rCTE （见这里）

5. customertable with LATERAL(see here)

5.customer表带LATERAL（见这里）

6. array_agg()with ORDER BY(see other answer)

6.array_agg()与ORDER BY（见其他答案）

Results

结果

A. Postgres 9.4 with 200k rows and ~ 20 per customer_id

A. Postgres 9.4 有 20 万行，每行 ~ 20 customer_id

B. The same with Postgres 9.5

B. 与 Postgres 9.5 相同

C. Same as B., but with ~ 2.3 rows per customer_id

C. 与 B. 相同，但每个约 2.3 行 customer_id

Related benchmarks

相关基准

Original (outdated) benchmark from 2011

2011 年的原始（过时）基准

回答by TMS

回答by Paul A Jungwirth

回答by user2407394

回答by matiu

Edit 1 - Use the postgres extension instead

编辑 1 - 改用 postgres 扩展

Edit 2 - Ordering and filtering

编辑 2 - 排序和过滤

回答by khaled_gomaa

回答by Alejandro Salamanca Mazuelo

回答by Diwas Poudel

相关推荐

如何在添加到项目中的 SQL Server Express 数据库中创建用户？

SQL 如何在sql中显示以a开头然后b开头的员工姓名

SQL 选择 BETWEEN 列值

SQL Server 2008 使用联接表中的联接和 Where 子句更新查询

相关推荐

最近更新

标签

Object sizes for table `purchases`

表的对象大小 `purchases`

1. `row_number()`in CTE, (see other answer)

1.`row_number()`在CTE中，（见其他答案）

2. `row_number()`in subquery (my optimization)

2.`row_number()`在子查询中（我的优化）

3. `DISTINCT ON`(see other answer)

3. `DISTINCT ON`（见其他答案）

4. rCTE with `LATERAL`subquery (see here)

4. 带有`LATERAL`子查询的rCTE （见这里）

5. `customer`table with `LATERAL`(see here)

5.`customer`表带`LATERAL`（见这里）

6. `array_agg()`with `ORDER BY`(see other answer)

6.`array_agg()`与`ORDER BY`（见其他答案）

A. Postgres 9.4 with 200k rows and ~ 20 per `customer_id`

A. Postgres 9.4 有 20 万行，每行 ~ 20 `customer_id`

C. Same as B., but with ~ 2.3 rows per `customer_id`

C. 与 B. 相同，但每个约 2.3 行 `customer_id`