SQL 选择每个 GROUP BY 组中的第一行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3800551/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 07:41:27  来源:igfitidea点击:

Select first row in each GROUP BY group?

sqlsqlitepostgresqlgroup-bygreatest-n-per-group

提问by David Wolever

As the title suggests, I'd like to select the first row of each set of rows grouped with a GROUP BY.

正如标题所暗示的那样,我想选择每组以GROUP BY.

Specifically, if I've got a purchasestable that looks like this:

具体来说,如果我有一个purchases看起来像这样的表:

SELECT * FROM purchases;

My Output:

我的输出:

id | customer | total
---+----------+------
 1 | Joe      | 5
 2 | Sally    | 3
 3 | Joe      | 2
 4 | Sally    | 1

I'd like to query for the idof the largest purchase (total) made by each customer. Something like this:

我想查询的id最大购买(的total)每个制造customer。像这样的东西:

SELECT FIRST(id), customer, FIRST(total)
FROM  purchases
GROUP BY customer
ORDER BY total DESC;

Expected Output:

预期输出:

FIRST(id) | customer | FIRST(total)
----------+----------+-------------
        1 | Joe      | 5
        2 | Sally    | 3

采纳答案by OMG Ponies

On Oracle 9.2+ (not 8i+ as originally stated), SQL Server 2005+, PostgreSQL 8.4+, DB2, Firebird 3.0+, Teradata, Sybase, Vertica:

在 Oracle 9.2+(不是最初声明的 8i+)、SQL Server 2005+、PostgreSQL 8.4+、DB2、Firebird 3.0+、Teradata、Sybase、Vertica 上:

WITH summary AS (
    SELECT p.id, 
           p.customer, 
           p.total, 
           ROW_NUMBER() OVER(PARTITION BY p.customer 
                                 ORDER BY p.total DESC) AS rk
      FROM PURCHASES p)
SELECT s.*
  FROM summary s
 WHERE s.rk = 1

Supported by any database:

任何数据库都支持:

But you need to add logic to break ties:

但是你需要添加逻辑来打破关系:

  SELECT MIN(x.id),  -- change to MAX if you want the highest
         x.customer, 
         x.total
    FROM PURCHASES x
    JOIN (SELECT p.customer,
                 MAX(total) AS max_total
            FROM PURCHASES p
        GROUP BY p.customer) y ON y.customer = x.customer
                              AND y.max_total = x.total
GROUP BY x.customer, x.total

回答by Erwin Brandstetter

In PostgreSQLthis is typically simpler and faster(more performance optimization below):

PostgreSQL 中,这通常更简单、更快(下面有更多的性能优化):

SELECT DISTINCT ON (customer)
       id, customer, total
FROM   purchases
ORDER  BY customer, total DESC, id;

Or shorter (if not as clear) with ordinal numbers of output columns:

或者更短(如果不是很清楚)输出列的序号:

SELECT DISTINCT ON (2)
       id, customer, total
FROM   purchases
ORDER  BY 2, 3 DESC, 1;

If totalcan be NULL (won't hurt either way, but you'll want to match existing indexes):

Iftotal可以是 NULL (不会有任何伤害,但你会想要匹配现有的索引):

...
ORDER  BY customer, total DESC NULLS LAST, id;

Major points

要点

  • DISTINCT ONis a PostgreSQL extension of the standard (where only DISTINCTon the whole SELECTlist is defined).

  • List any number of expressions in the DISTINCT ONclause, the combined row value defines duplicates. The manual:

    Obviously, two rows are considered distinct if they differ in at least one column value. Null values are considered equal in this comparison.

    Bold emphasis mine.

  • DISTINCT ONcan be combined with ORDER BY. Leading expressions in ORDER BYmust be in the set of expressions in DISTINCT ON, but you can rearrange order among those freely. Example.You can add additionalexpressions to ORDER BYto pick a particular row from each group of peers. Or, as the manual puts it:

    The DISTINCT ONexpression(s) must match the leftmost ORDER BYexpression(s). The ORDER BYclause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ONgroup.

    I added idas last item to break ties:
    "Pick the row with the smallest idfrom each group sharing the highest total."

    To order results in a way that disagrees with the sort order determining the first per group, you can nest above query in an outer query with another ORDER BY. Example.

  • If totalcan be NULL, you most probablywant the row with the greatest non-null value. Add NULLS LASTlike demonstrated. See:

  • The SELECTlistis not constrained by expressions in DISTINCT ONor ORDER BYin any way. (Not needed in the simple case above):

    • You don't have toinclude any of the expressions in DISTINCT ONor ORDER BY.

    • You caninclude any other expression in the SELECTlist. This is instrumental for replacing much more complex queries with subqueries and aggregate / window functions.

  • I tested with Postgres versions 8.3 – 12. But the feature has been there at least since version 7.1, so basically always.

  • DISTINCT ON是标准的 PostgreSQL 扩展(仅定义DISTINCT了整个SELECT列表)。

  • 列出DISTINCT ON子句中任意数量的表达式,组合的行值定义重复项。手册:

    显然,如果两行至少有一个列值不同,则它们被认为是不同的。在此比较中,空值被视为相等。

    大胆强调我的。

  • DISTINCT ON可以与ORDER BY. in 中的前导表达式ORDER BY必须在 中的表达式集中DISTINCT ON,但您可以自由地重新排列它们之间的顺序。例子。您可以添加其他表达式以ORDER BY从每组对等项中选择特定行。或者,正如手册所说

    DISTINCT ON表达式(一个或多个)必须最左边的匹配ORDER BY表达式(一个或多个)。该ORDER BY子句通常包含附加表达式,用于确定每个DISTINCT ON组中行的所需优先级。

    我添加了id作为打破平局的最后一项:
    id从每个组中选择最小的行共享最高的total。”

    要以与确定每组第一个的排序顺序不一致的方式对结果进行排序,您可以将上面的查询嵌套在外部查询中与另一个ORDER BY. 例子。

  • 如果total可以为 NULL,则您很可能想要具有最大非空值的行。添加NULLS LAST如演示。看:

  • SELECT名单不是由表达式约束DISTINCT ONORDER BY以任何方式。(在上面的简单情况下不需要):

    • 不必DISTINCT ON或 中包含任何表达式ORDER BY

    • 可以SELECT列表中包含任何其他表达式。这有助于用子查询和聚合/窗口函数替换更复杂的查询。

  • 我使用 Postgres 8.3 – 12 版进行了测试。但该功能至少从 7.1 版开始就已经存在,所以基本上总是如此。

Index

指数

The perfectindex for the above query would be a multi-column indexspanning all three columns in matching sequence and with matching sort order:

上述查询的完美索引应该是一个多列索引,它以匹配的顺序和匹配的排序顺序跨越所有三列:

CREATE INDEX purchases_3c_idx ON purchases (customer, total DESC, id);

May be too specialized. But use it if read performance for the particular query is crucial. If you have DESC NULLS LASTin the query, use the same in the index so that sort order matches and the index is applicable.

可能太专业了。但是如果特定查询的读取性能至关重要,请使用它。如果DESC NULLS LAST在查询中有,请在索引中使用相同的内容,以便排序顺序匹配并且索引适用。

Effectiveness / Performance optimization

有效性/性能优化

Weigh cost and benefit before creating tailored indexes for each query. The potential of above index largely depends on data distribution.

在为每个查询创建定制索引之前权衡成本和收益。上述指标的潜力很大程度上取决于数据分布

The index is used because it delivers pre-sorted data. In Postgres 9.2 or later the query can also benefit from an index only scanif the index is smaller than the underlying table. The index has to be scanned in its entirety, though.

使用索引是因为它提供预先排序的数据。在 Postgres 9.2 或更高版本中,如果索引小于基础表,查询也可以从仅索引扫描中受益。但是,必须完整扫描索引。

Benchmark

基准

I had a simple benchmark here which is outdated by now. I replaced it with a detailed benchmark in this separate answer.

我在这里有一个简单的基准测试,但现在已经过时了。我在这个单独的答案中用详细的基准替换了它。

回答by Erwin Brandstetter

Benchmark

基准

Testing the most interesting candidates with Postgres 9.4and 9.5with a halfway realistic table of 200k rowsin purchasesand 10k distinct customer_id(avg. 20 rows per customer).

测试中最有趣的候选人的Postgres 9.49.5用的中途现实表20万行purchases10K不同customer_id平均每用户20行)。

For Postgres 9.5 I ran a 2nd test with effectively 86446 distinct customers. See below (avg. 2.3 rows per customer).

对于 Postgres 9.5,我对 86446 个不同的客户进行了第二次测试。见下文(每位客户平均 2.3 行)。

Setup

设置

Main table

主桌

CREATE TABLE purchases (
  id          serial
, customer_id int  -- REFERENCES customer
, total       int  -- could be amount of money in Cent
, some_column text -- to make the row bigger, more realistic
);

I use a serial(PK constraint added below) and an integer customer_idsince that's a more typical setup. Also added some_columnto make up for typically more columns.

我使用一个serial(下面添加了PK 约束)和一个整数,customer_id因为这是一个更典型的设置。还添加some_column以弥补通常更多的列。

Dummy data, PK, index - a typical table also has some dead tuples:

虚拟数据、PK、索引——一个典型的表也有一些死元组:

INSERT INTO purchases (customer_id, total, some_column)    -- insert 200k rows
SELECT (random() * 10000)::int             AS customer_id  -- 10k customers
     , (random() * random() * 100000)::int AS total     
     , 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM   generate_series(1,200000) g;

ALTER TABLE purchases ADD CONSTRAINT purchases_id_pkey PRIMARY KEY (id);

DELETE FROM purchases WHERE random() > 0.9; -- some dead rows

INSERT INTO purchases (customer_id, total, some_column)
SELECT (random() * 10000)::int             AS customer_id  -- 10k customers
     , (random() * random() * 100000)::int AS total     
     , 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM   generate_series(1,20000) g;  -- add 20k to make it ~ 200k

CREATE INDEX purchases_3c_idx ON purchases (customer_id, total DESC, id);

VACUUM ANALYZE purchases;

customertable - for superior query

customer表 - 用于高级查询

CREATE TABLE customer AS
SELECT customer_id, 'customer_' || customer_id AS customer
FROM   purchases
GROUP  BY 1
ORDER  BY 1;

ALTER TABLE customer ADD CONSTRAINT customer_customer_id_pkey PRIMARY KEY (customer_id);

VACUUM ANALYZE customer;

In my second testfor 9.5 I used the same setup, but with random() * 100000to generate customer_idto get only few rows per customer_id.

在我对 9.5 的第二次测试中,我使用了相同的设置,但使用random() * 100000生成customer_id每个customer_id.

Object sizes for table purchases

表的对象大小 purchases

Generated with this query.

使用此查询生成。

               what                | bytes/ct | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
 core_relation_size                | 20496384 | 20 MB        |           102
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |    24576 | 24 kB        |             0
 table_size_incl_toast             | 20529152 | 20 MB        |           102
 indexes_size                      | 10977280 | 10 MB        |            54
 total_size_incl_toast_and_indexes | 31506432 | 30 MB        |           157
 live_rows_in_text_representation  | 13729802 | 13 MB        |            68
 ------------------------------    |          |              |
 row_count                         |   200045 |              |
 live_tuples                       |   200045 |              |
 dead_tuples                       |    19955 |              |

Queries

查询

1. row_number()in CTE, (see other answer)

1.row_number()在CTE中,(见其他答案

WITH cte AS (
   SELECT id, customer_id, total
        , row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
   FROM   purchases
   )
SELECT id, customer_id, total
FROM   cte
WHERE  rn = 1;

2. row_number()in subquery (my optimization)

2.row_number()在子查询中(我的优化)

SELECT id, customer_id, total
FROM   (
   SELECT id, customer_id, total
        , row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
   FROM   purchases
   ) sub
WHERE  rn = 1;

3. DISTINCT ON(see other answer)

3. DISTINCT ON见其他答案

SELECT DISTINCT ON (customer_id)
       id, customer_id, total
FROM   purchases
ORDER  BY customer_id, total DESC, id;

4. rCTE with LATERALsubquery (see here)

4. 带有LATERAL子查询的rCTE (见这里

WITH RECURSIVE cte AS (
   (  -- parentheses required
   SELECT id, customer_id, total
   FROM   purchases
   ORDER  BY customer_id, total DESC
   LIMIT  1
   )
   UNION ALL
   SELECT u.*
   FROM   cte c
   ,      LATERAL (
      SELECT id, customer_id, total
      FROM   purchases
      WHERE  customer_id > c.customer_id  -- lateral reference
      ORDER  BY customer_id, total DESC
      LIMIT  1
      ) u
   )
SELECT id, customer_id, total
FROM   cte
ORDER  BY customer_id;

5. customertable with LATERAL(see here)

5.customer表带LATERAL见这里

SELECT l.*
FROM   customer c
,      LATERAL (
   SELECT id, customer_id, total
   FROM   purchases
   WHERE  customer_id = c.customer_id  -- lateral reference
   ORDER  BY total DESC
   LIMIT  1
   ) l;

6. array_agg()with ORDER BY(see other answer)

6.array_agg()ORDER BY见其他答案

SELECT (array_agg(id ORDER BY total DESC))[1] AS id
     , customer_id
     , max(total) AS total
FROM   purchases
GROUP  BY customer_id;

Results

结果

Execution time for above queries with EXPLAIN ANALYZE(and all options off), best of 5 runs.

上述查询的执行时间EXPLAIN ANALYZE(和所有选项都关闭),最好的 5 次运行

Allqueries used an Index Only Scanon purchases2_3c_idx(among other steps). Some of them just for the smaller size of the index, others more effectively.

所有查询使用的索引只扫描purchases2_3c_idx(其他步骤中)。其中一些仅适用于较小的索引大小,而另一些则更有效。

A. Postgres 9.4 with 200k rows and ~ 20 per customer_id

A. Postgres 9.4 有 20 万行,每行 ~ 20 customer_id

1. 273.274 ms  
2. 194.572 ms  
3. 111.067 ms  
4.  92.922 ms  
5.  37.679 ms  -- winner
6. 189.495 ms

B. The same with Postgres 9.5

B. 与 Postgres 9.5 相同

1. 288.006 ms
2. 223.032 ms  
3. 107.074 ms  
4.  78.032 ms  
5.  33.944 ms  -- winner
6. 211.540 ms  

C. Same as B., but with ~ 2.3 rows per customer_id

C. 与 B. 相同,但每个约 2.3 行 customer_id

1. 381.573 ms
2. 311.976 ms
3. 124.074 ms  -- winner
4. 710.631 ms
5. 311.976 ms
6. 421.679 ms


Related benchmarks

相关基准

Here is a new one by "ogr" testing with 10M rows and 60k unique "customers"on Postgres 11.5(current as of Sep. 2019). Results are still in line with what we have seen so far:

这是一个新的“ogr”测试,在Postgres 11.5(截至 2019 年 9 月)上有1000 万行和 6 万个唯一“客户”。结果仍然与我们目前看到的一致:

Original (outdated) benchmark from 2011

2011 年的原始(过时)基准

I ran three tests with PostgreSQL 9.1on a real life table of 65579 rows and single-column btree indexes on each of the three columns involved and took the best execution timeof 5 runs.
Comparing @OMGPonies'first query (A) to the above DISTINCT ONsolution(B):

我使用 PostgreSQL 9.1在真实的 65579 行表和所涉及的三列中的每一列上的单列 btree 索引上运行了三个测试,并获得了5 次运行的最佳执行时间
@OMGPonies 的第一个查询 ( A) 与上述DISTINCT ON解决方案( B) 进行比较:

  1. Select the whole table, results in 5958 rows in this case.

    A: 567.218 ms
    B: 386.673 ms
    
  2. Use condition WHERE customer BETWEEN x AND yresulting in 1000 rows.

    A: 249.136 ms
    B:  55.111 ms
    
  3. Select a single customer with WHERE customer = x.

    A:   0.143 ms
    B:   0.072 ms
    
  1. 选择整个表,在这种情况下会产生 5958 行。

    A: 567.218 ms
    B: 386.673 ms
    
  2. 使用条件WHERE customer BETWEEN x AND y产生 1000 行。

    A: 249.136 ms
    B:  55.111 ms
    
  3. 选择一个带有WHERE customer = x.

    A:   0.143 ms
    B:   0.072 ms
    

Same test repeated with the index described in the other answer

使用另一个答案中描述的索引重复相同的测试

CREATE INDEX purchases_3c_idx ON purchases (customer, total DESC, id);

1A: 277.953 ms  
1B: 193.547 ms

2A: 249.796 ms -- special index not used  
2B:  28.679 ms

3A:   0.120 ms  
3B:   0.048 ms

回答by TMS

This is common greatest-n-per-groupproblem, which already has well tested and highly optimized solutions. Personally I prefer the left join solution by Bill Karwin(the original post with lots of other solutions).

这是常见的每组最大 n 个问题,已经有经过充分测试和高度优化的解决方案。就我个人而言,我更喜欢Bill Karwinleft join 解决方案包含许多其他解决方案原始帖子)。

Note that bunch of solutions to this common problem can surprisingly be found in the one of most official sources, MySQL manual! See Examples of Common Queries :: The Rows Holding the Group-wise Maximum of a Certain Column.

请注意,在最官方的来源之一MySQL 手册中可以令人惊讶地找到针对此常见问题的大量解决方案!请参阅常见查询示例:: 保留特定列的分组最大值的行

回答by Paul A Jungwirth

In Postgres you can use array_agglike this:

在 Postgres 中,您可以这样使用array_agg

SELECT  customer,
        (array_agg(id ORDER BY total DESC))[1],
        max(total)
FROM purchases
GROUP BY customer

This will give you the idof each customer's largest purchase.

这将为您id提供每个客户的最大购买量。

Some things to note:

一些注意事项:

  • array_aggis an aggregate function, so it works with GROUP BY.
  • array_agglets you specify an ordering scoped to just itself, so it doesn't constrain the structure of the whole query. There is also syntax for how you sort NULLs, if you need to do something different from the default.
  • Once we build the array, we take the first element. (Postgres arrays are 1-indexed, not 0-indexed).
  • You could use array_aggin a similar way for your third output column, but max(total)is simpler.
  • Unlike DISTINCT ON, using array_agglets you keep your GROUP BY, in case you want that for other reasons.
  • array_agg是一个聚合函数,所以它适用于GROUP BY.
  • array_agg允许您指定范围仅限于自身的排序,因此它不会限制整个查询的结构。如果您需要做一些与默认值不同的事情,还有关于如何对 NULL 进行排序的语法。
  • 一旦我们构建了数组,我们就取第一个元素。(Postgres 数组是 1 索引的,而不是 0 索引的)。
  • 您可以array_agg以类似的方式使用第三个输出列,但max(total)更简单。
  • 与 不同DISTINCT ON, usingarray_agg可以让您保留您的GROUP BY,以防您出于其他原因需要。

回答by user2407394

The solution is not very efficient as pointed by Erwin, because of presence of SubQs

正如 Erwin 所指出的,该解决方案不是很有效,因为存在 SubQ

select * from purchases p1 where total in
(select max(total) from purchases where p1.customer=customer) order by total desc;

回答by matiu

I use this way (postgresql only): https://wiki.postgresql.org/wiki/First/last_%28aggregate%29

我使用这种方式(仅 postgresql):https: //wiki.postgresql.org/wiki/First/last_%28aggregate%29

-- Create a function that always returns the first non-NULL item
CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
        SELECT ;
$$;

-- And then wrap an aggregate around it
CREATE AGGREGATE public.first (
        sfunc    = public.first_agg,
        basetype = anyelement,
        stype    = anyelement
);

-- Create a function that always returns the last non-NULL item
CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
        SELECT ;
$$;

-- And then wrap an aggregate around it
CREATE AGGREGATE public.last (
        sfunc    = public.last_agg,
        basetype = anyelement,
        stype    = anyelement
);

Then your example should work almostas is:

那么您的示例应该几乎按原样工作:

SELECT FIRST(id), customer, FIRST(total)
FROM  purchases
GROUP BY customer
ORDER BY FIRST(total) DESC;

CAVEAT: It ignore's NULL rows

CAVEAT:它忽略 NULL 行



Edit 1 - Use the postgres extension instead

编辑 1 - 改用 postgres 扩展

Now I use this way: http://pgxn.org/dist/first_last_agg/

现在我用这种方式:http: //pgxn.org/dist/first_last_agg/

To install on ubuntu 14.04:

在 ubuntu 14.04 上安装:

apt-get install postgresql-server-dev-9.3 git build-essential -y
git clone git://github.com/wulczer/first_last_agg.git
cd first_last_app
make && sudo make install
psql -c 'create extension first_last_agg'

It's a postgres extension that gives you first and last functions; apparently faster than the above way.

它是一个 postgres 扩展,为您提供 first 和 last 功能;显然比上述方式更快。



Edit 2 - Ordering and filtering

编辑 2 - 排序和过滤

If you use aggregate functions (like these), you can order the results, without the need to have the data already ordered:

如果您使用聚合函数(像这些),您可以对结果进行排序,而无需对数据进行排序:

http://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES

So the equivalent example, with ordering would be something like:

所以等效的例子,排序将是这样的:

SELECT first(id order by id), customer, first(total order by id)
  FROM purchases
 GROUP BY customer
 ORDER BY first(total);

Of course you can order and filter as you deem fit within the aggregate; it's very powerful syntax.

当然,您可以按照您认为适合的方式进行排序和过滤;这是非常强大的语法。

回答by khaled_gomaa

The Query:

查询:

SELECT purchases.*
FROM purchases
LEFT JOIN purchases as p 
ON 
  p.customer = purchases.customer 
  AND 
  purchases.total < p.total
WHERE p.total IS NULL

HOW DOES THAT WORK!(I've been there)

这是如何运作的!(我去过那儿)

We want to make sure that we only have the highest total for each purchase.

我们希望确保每次购买的总金额最高。



Some Theoretical Stuff(skip this part if you only want to understand the query)

一些理论知识(如果您只想了解查询,请跳过此部分)

Let Total be a function T(customer,id) where it returns a value given the name and id To prove that the given total (T(customer,id)) is the highest we have to prove that We want to prove either

设 Total 是一个函数 T(customer,id),它返回一个给定名称和 id 的值为了证明给定的总数 (T(customer,id)) 是我们必须证明的最高值 我们想要证明

  • ?x T(customer,id) > T(customer,x) (this total is higher than all other total for that customer)
  • ?x T(customer,id) > T(customer,x)(此总数高于该客户的所有其他总数)

OR

或者

  • ??x T(customer, id) < T(customer, x) (there exists no higher total for that customer)
  • ??x T(customer, id) < T(customer, x)(该客户不存在更高的总数)

The first approach will need us to get all the records for that name which I do not really like.

第一种方法需要我们获取我不喜欢的那个名字的所有记录。

The second one will need a smart way to say there can be no record higher than this one.

第二个需要一种聪明的方式来说明没有比这个更高的记录。



Back to SQL

回到 SQL

If we left joins the table on the name and total being less than the joined table:

如果我们在名称上留下连接表,并且总数小于连接表:

      LEFT JOIN purchases as p 
      ON 
      p.customer = purchases.customer 
      AND 
      purchases.total < p.total

we make sure that all records that have another record with the higher total for the same user to be joined:

我们确保所有具有另一条记录且同一用户的总数较高的记录都被加入:

purchases.id, purchases.customer, purchases.total, p.id, p.customer, p.total
1           , Tom           , 200             , 2   , Tom   , 300
2           , Tom           , 300
3           , Bob           , 400             , 4   , Bob   , 500
4           , Bob           , 500
5           , Alice         , 600             , 6   , Alice   , 700
6           , Alice         , 700

That will help us filter for the highest total for each purchase with no grouping needed:

这将帮助我们过滤每次购买的最高总数,而无需分组:

WHERE p.total IS NULL

purchases.id, purchases.name, purchases.total, p.id, p.name, p.total
2           , Tom           , 300
4           , Bob           , 500
6           , Alice         , 700

And that's the answer we need.

这就是我们需要的答案。

回答by Alejandro Salamanca Mazuelo

Very fast solution

非常快速的解决方案

SELECT a.* 
FROM
    purchases a 
    JOIN ( 
        SELECT customer, min( id ) as id 
        FROM purchases 
        GROUP BY customer 
    ) b USING ( id );

and really very fast if table is indexed by id:

如果表按 id 索引,速度真的非常快:

create index purchases_id on purchases (id);

回答by Diwas Poudel

In SQL Server you can do this:

在 SQL Server 中,您可以这样做:

SELECT *
FROM (
SELECT ROW_NUMBER()
OVER(PARTITION BY customer
ORDER BY total DESC) AS StRank, *
FROM Purchases) n
WHERE StRank = 1

Explaination:Here Group byis done on the basis of customer and then order it by total then each such group is given serial number as StRank and we are taking out first 1 customer whose StRank is 1

说明:这里 客户分组,然后按总数排序,然后每个这样的组被赋予序列号作为StRank,我们取出第一个StRank为1的客户