SQL 选择每个 GROUP BY 组中的第一行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3800551/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select first row in each GROUP BY group?
提问by David Wolever
As the title suggests, I'd like to select the first row of each set of rows grouped with a GROUP BY
.
正如标题所暗示的那样,我想选择每组以GROUP BY
.
Specifically, if I've got a purchases
table that looks like this:
具体来说,如果我有一个purchases
看起来像这样的表:
SELECT * FROM purchases;
My Output:
我的输出:
id | customer | total ---+----------+------ 1 | Joe | 5 2 | Sally | 3 3 | Joe | 2 4 | Sally | 1
I'd like to query for the id
of the largest purchase (total
) made by each customer
. Something like this:
我想查询的id
最大购买(的total
)每个制造customer
。像这样的东西:
SELECT FIRST(id), customer, FIRST(total)
FROM purchases
GROUP BY customer
ORDER BY total DESC;
Expected Output:
预期输出:
FIRST(id) | customer | FIRST(total) ----------+----------+------------- 1 | Joe | 5 2 | Sally | 3
采纳答案by OMG Ponies
On Oracle 9.2+ (not 8i+ as originally stated), SQL Server 2005+, PostgreSQL 8.4+, DB2, Firebird 3.0+, Teradata, Sybase, Vertica:
在 Oracle 9.2+(不是最初声明的 8i+)、SQL Server 2005+、PostgreSQL 8.4+、DB2、Firebird 3.0+、Teradata、Sybase、Vertica 上:
WITH summary AS (
SELECT p.id,
p.customer,
p.total,
ROW_NUMBER() OVER(PARTITION BY p.customer
ORDER BY p.total DESC) AS rk
FROM PURCHASES p)
SELECT s.*
FROM summary s
WHERE s.rk = 1
Supported by any database:
任何数据库都支持:
But you need to add logic to break ties:
但是你需要添加逻辑来打破关系:
SELECT MIN(x.id), -- change to MAX if you want the highest
x.customer,
x.total
FROM PURCHASES x
JOIN (SELECT p.customer,
MAX(total) AS max_total
FROM PURCHASES p
GROUP BY p.customer) y ON y.customer = x.customer
AND y.max_total = x.total
GROUP BY x.customer, x.total
回答by Erwin Brandstetter
In PostgreSQLthis is typically simpler and faster(more performance optimization below):
在PostgreSQL 中,这通常更简单、更快(下面有更多的性能优化):
SELECT DISTINCT ON (customer)
id, customer, total
FROM purchases
ORDER BY customer, total DESC, id;
Or shorter (if not as clear) with ordinal numbers of output columns:
或者更短(如果不是很清楚)输出列的序号:
SELECT DISTINCT ON (2)
id, customer, total
FROM purchases
ORDER BY 2, 3 DESC, 1;
If total
can be NULL (won't hurt either way, but you'll want to match existing indexes):
Iftotal
可以是 NULL (不会有任何伤害,但你会想要匹配现有的索引):
...
ORDER BY customer, total DESC NULLS LAST, id;
Major points
要点
DISTINCT ON
is a PostgreSQL extension of the standard (where onlyDISTINCT
on the wholeSELECT
list is defined).List any number of expressions in the
DISTINCT ON
clause, the combined row value defines duplicates. The manual:Obviously, two rows are considered distinct if they differ in at least one column value. Null values are considered equal in this comparison.
Bold emphasis mine.
DISTINCT ON
can be combined withORDER BY
. Leading expressions inORDER BY
must be in the set of expressions inDISTINCT ON
, but you can rearrange order among those freely. Example.You can add additionalexpressions toORDER BY
to pick a particular row from each group of peers. Or, as the manual puts it:The
DISTINCT ON
expression(s) must match the leftmostORDER BY
expression(s). TheORDER BY
clause will normally contain additional expression(s) that determine the desired precedence of rows within eachDISTINCT ON
group.I added
id
as last item to break ties:
"Pick the row with the smallestid
from each group sharing the highesttotal
."To order results in a way that disagrees with the sort order determining the first per group, you can nest above query in an outer query with another
ORDER BY
. Example.If
total
can be NULL, you most probablywant the row with the greatest non-null value. AddNULLS LAST
like demonstrated. See:The
SELECT
listis not constrained by expressions inDISTINCT ON
orORDER BY
in any way. (Not needed in the simple case above):You don't have toinclude any of the expressions in
DISTINCT ON
orORDER BY
.You caninclude any other expression in the
SELECT
list. This is instrumental for replacing much more complex queries with subqueries and aggregate / window functions.
I tested with Postgres versions 8.3 – 12. But the feature has been there at least since version 7.1, so basically always.
DISTINCT ON
是标准的 PostgreSQL 扩展(仅定义DISTINCT
了整个SELECT
列表)。列出
DISTINCT ON
子句中任意数量的表达式,组合的行值定义重复项。手册:显然,如果两行至少有一个列值不同,则它们被认为是不同的。在此比较中,空值被视为相等。
大胆强调我的。
DISTINCT ON
可以与ORDER BY
. in 中的前导表达式ORDER BY
必须在 中的表达式集中DISTINCT ON
,但您可以自由地重新排列它们之间的顺序。例子。您可以添加其他表达式以ORDER BY
从每组对等项中选择特定行。或者,正如手册所说:的
DISTINCT ON
表达式(一个或多个)必须最左边的匹配ORDER BY
表达式(一个或多个)。该ORDER BY
子句通常包含附加表达式,用于确定每个DISTINCT ON
组中行的所需优先级。我添加了
id
作为打破平局的最后一项:
“id
从每个组中选择最小的行共享最高的total
。”要以与确定每组第一个的排序顺序不一致的方式对结果进行排序,您可以将上面的查询嵌套在外部查询中与另一个
ORDER BY
. 例子。如果
total
可以为 NULL,则您很可能想要具有最大非空值的行。添加NULLS LAST
如演示。看:该
SELECT
名单不是由表达式约束DISTINCT ON
或ORDER BY
以任何方式。(在上面的简单情况下不需要):您不必在
DISTINCT ON
或 中包含任何表达式ORDER BY
。您可以在
SELECT
列表中包含任何其他表达式。这有助于用子查询和聚合/窗口函数替换更复杂的查询。
我使用 Postgres 8.3 – 12 版进行了测试。但该功能至少从 7.1 版开始就已经存在,所以基本上总是如此。
Index
指数
The perfectindex for the above query would be a multi-column indexspanning all three columns in matching sequence and with matching sort order:
上述查询的完美索引应该是一个多列索引,它以匹配的顺序和匹配的排序顺序跨越所有三列:
CREATE INDEX purchases_3c_idx ON purchases (customer, total DESC, id);
May be too specialized. But use it if read performance for the particular query is crucial. If you have DESC NULLS LAST
in the query, use the same in the index so that sort order matches and the index is applicable.
可能太专业了。但是如果特定查询的读取性能至关重要,请使用它。如果DESC NULLS LAST
在查询中有,请在索引中使用相同的内容,以便排序顺序匹配并且索引适用。
Effectiveness / Performance optimization
有效性/性能优化
Weigh cost and benefit before creating tailored indexes for each query. The potential of above index largely depends on data distribution.
在为每个查询创建定制索引之前权衡成本和收益。上述指标的潜力很大程度上取决于数据分布。
The index is used because it delivers pre-sorted data. In Postgres 9.2 or later the query can also benefit from an index only scanif the index is smaller than the underlying table. The index has to be scanned in its entirety, though.
使用索引是因为它提供预先排序的数据。在 Postgres 9.2 或更高版本中,如果索引小于基础表,查询也可以从仅索引扫描中受益。但是,必须完整扫描索引。
For fewrows per customer(high cardinality in column
customer
), this is very efficient. Even more so if you need sorted output anyway. The benefit shrinks with a growing number of rows per customer.
Ideally, you have enoughwork_mem
to process the involved sort step in RAM and not spill to disk. But generally settingwork_mem
toohigh can have adverse effects. ConsiderSET LOCAL
for exceptionally big queries. Find how much you need withEXPLAIN ANALYZE
. Mention of "Disk:" in the sort step indicates the need for more:For manyrows per customer(low cardinality in column
customer
), a loose index scan(a.k.a. "skip scan") would be (much) more efficient, but that's not implemented up to Postgres 12. (An implementation for index-only scans is in development for Postgres 13. See hereand here.)
For now, there are faster query techniquesto substitute for this. In particular if you have a separate table holding unique customers, which is the typical use case. But also if you don't:
对于每个客户的几行(列中的高基数
customer
),这是非常有效的。如果您无论如何都需要排序输出,则更是如此。随着每位客户的行数越来越多,收益会减少。
理想情况下,您有足够的work_mem
时间在 RAM 中处理所涉及的排序步骤,而不会溢出到磁盘。但通常设置work_mem
太高会产生不利影响。考虑SET LOCAL
特别大的查询。用 找到你需要多少EXPLAIN ANALYZE
。在排序步骤中提到“磁盘:”表示需要更多:对于每个客户的多行(列中的基数低
customer
),松散索引扫描(又名“跳过扫描”)会(更)高效,但直到 Postgres 12 为止还没有实现。(仅索引扫描的实现在Postgres 13 的开发。请参阅此处和此处。)
目前,有更快的查询技术来替代它。特别是如果您有一个单独的表保存唯一客户,这是典型的用例。但如果你不这样做:
Benchmark
基准
I had a simple benchmark here which is outdated by now. I replaced it with a detailed benchmark in this separate answer.
回答by Erwin Brandstetter
Benchmark
基准
Testing the most interesting candidates with Postgres 9.4and 9.5with a halfway realistic table of 200k rowsin purchases
and 10k distinct customer_id
(avg. 20 rows per customer).
测试中最有趣的候选人的Postgres 9.4和9.5用的中途现实表20万行中purchases
和10K不同customer_id
(平均每用户20行)。
For Postgres 9.5 I ran a 2nd test with effectively 86446 distinct customers. See below (avg. 2.3 rows per customer).
对于 Postgres 9.5,我对 86446 个不同的客户进行了第二次测试。见下文(每位客户平均 2.3 行)。
Setup
设置
Main table
主桌
CREATE TABLE purchases (
id serial
, customer_id int -- REFERENCES customer
, total int -- could be amount of money in Cent
, some_column text -- to make the row bigger, more realistic
);
I use a serial
(PK constraint added below) and an integer customer_id
since that's a more typical setup. Also added some_column
to make up for typically more columns.
我使用一个serial
(下面添加了PK 约束)和一个整数,customer_id
因为这是一个更典型的设置。还添加some_column
以弥补通常更多的列。
Dummy data, PK, index - a typical table also has some dead tuples:
虚拟数据、PK、索引——一个典型的表也有一些死元组:
INSERT INTO purchases (customer_id, total, some_column) -- insert 200k rows
SELECT (random() * 10000)::int AS customer_id -- 10k customers
, (random() * random() * 100000)::int AS total
, 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM generate_series(1,200000) g;
ALTER TABLE purchases ADD CONSTRAINT purchases_id_pkey PRIMARY KEY (id);
DELETE FROM purchases WHERE random() > 0.9; -- some dead rows
INSERT INTO purchases (customer_id, total, some_column)
SELECT (random() * 10000)::int AS customer_id -- 10k customers
, (random() * random() * 100000)::int AS total
, 'note: ' || repeat('x', (random()^2 * random() * random() * 500)::int)
FROM generate_series(1,20000) g; -- add 20k to make it ~ 200k
CREATE INDEX purchases_3c_idx ON purchases (customer_id, total DESC, id);
VACUUM ANALYZE purchases;
customer
table - for superior query
customer
表 - 用于高级查询
CREATE TABLE customer AS
SELECT customer_id, 'customer_' || customer_id AS customer
FROM purchases
GROUP BY 1
ORDER BY 1;
ALTER TABLE customer ADD CONSTRAINT customer_customer_id_pkey PRIMARY KEY (customer_id);
VACUUM ANALYZE customer;
In my second testfor 9.5 I used the same setup, but with random() * 100000
to generate customer_id
to get only few rows per customer_id
.
在我对 9.5 的第二次测试中,我使用了相同的设置,但使用random() * 100000
生成customer_id
每个customer_id
.
Object sizes for table purchases
表的对象大小 purchases
Generated with this query.
使用此查询生成。
what | bytes/ct | bytes_pretty | bytes_per_row
-----------------------------------+----------+--------------+---------------
core_relation_size | 20496384 | 20 MB | 102
visibility_map | 0 | 0 bytes | 0
free_space_map | 24576 | 24 kB | 0
table_size_incl_toast | 20529152 | 20 MB | 102
indexes_size | 10977280 | 10 MB | 54
total_size_incl_toast_and_indexes | 31506432 | 30 MB | 157
live_rows_in_text_representation | 13729802 | 13 MB | 68
------------------------------ | | |
row_count | 200045 | |
live_tuples | 200045 | |
dead_tuples | 19955 | |
Queries
查询
1. row_number()
in CTE, (see other answer)
1.row_number()
在CTE中,(见其他答案)
WITH cte AS (
SELECT id, customer_id, total
, row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
FROM purchases
)
SELECT id, customer_id, total
FROM cte
WHERE rn = 1;
2. row_number()
in subquery (my optimization)
2.row_number()
在子查询中(我的优化)
SELECT id, customer_id, total
FROM (
SELECT id, customer_id, total
, row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
FROM purchases
) sub
WHERE rn = 1;
3. DISTINCT ON
(see other answer)
3. DISTINCT ON
(见其他答案)
SELECT DISTINCT ON (customer_id)
id, customer_id, total
FROM purchases
ORDER BY customer_id, total DESC, id;
4. rCTE with LATERAL
subquery (see here)
4. 带有LATERAL
子查询的rCTE (见这里)
WITH RECURSIVE cte AS (
( -- parentheses required
SELECT id, customer_id, total
FROM purchases
ORDER BY customer_id, total DESC
LIMIT 1
)
UNION ALL
SELECT u.*
FROM cte c
, LATERAL (
SELECT id, customer_id, total
FROM purchases
WHERE customer_id > c.customer_id -- lateral reference
ORDER BY customer_id, total DESC
LIMIT 1
) u
)
SELECT id, customer_id, total
FROM cte
ORDER BY customer_id;
5. customer
table with LATERAL
(see here)
5.customer
表带LATERAL
(见这里)
SELECT l.*
FROM customer c
, LATERAL (
SELECT id, customer_id, total
FROM purchases
WHERE customer_id = c.customer_id -- lateral reference
ORDER BY total DESC
LIMIT 1
) l;
6. array_agg()
with ORDER BY
(see other answer)
6.array_agg()
与ORDER BY
(见其他答案)
SELECT (array_agg(id ORDER BY total DESC))[1] AS id
, customer_id
, max(total) AS total
FROM purchases
GROUP BY customer_id;
Results
结果
Execution time for above queries with EXPLAIN ANALYZE
(and all options off), best of 5 runs.
上述查询的执行时间EXPLAIN ANALYZE
(和所有选项都关闭),最好的 5 次运行。
Allqueries used an Index Only Scanon purchases2_3c_idx
(among other steps). Some of them just for the smaller size of the index, others more effectively.
所有查询使用的索引只扫描上purchases2_3c_idx
(其他步骤中)。其中一些仅适用于较小的索引大小,而另一些则更有效。
A. Postgres 9.4 with 200k rows and ~ 20 per customer_id
A. Postgres 9.4 有 20 万行,每行 ~ 20 customer_id
1. 273.274 ms
2. 194.572 ms
3. 111.067 ms
4. 92.922 ms
5. 37.679 ms -- winner
6. 189.495 ms
B. The same with Postgres 9.5
B. 与 Postgres 9.5 相同
1. 288.006 ms
2. 223.032 ms
3. 107.074 ms
4. 78.032 ms
5. 33.944 ms -- winner
6. 211.540 ms
C. Same as B., but with ~ 2.3 rows per customer_id
C. 与 B. 相同,但每个约 2.3 行 customer_id
1. 381.573 ms
2. 311.976 ms
3. 124.074 ms -- winner
4. 710.631 ms
5. 311.976 ms
6. 421.679 ms
Related benchmarks
相关基准
Here is a new one by "ogr" testing with 10M rows and 60k unique "customers"on Postgres 11.5(current as of Sep. 2019). Results are still in line with what we have seen so far:
这是一个新的“ogr”测试,在Postgres 11.5(截至 2019 年 9 月)上有1000 万行和 6 万个唯一“客户”。结果仍然与我们目前看到的一致:
Original (outdated) benchmark from 2011
2011 年的原始(过时)基准
I ran three tests with PostgreSQL 9.1on a real life table of 65579 rows and single-column btree indexes on each of the three columns involved and took the best execution timeof 5 runs.
Comparing @OMGPonies'first query (A
) to the above DISTINCT ON
solution(B
):
我使用 PostgreSQL 9.1在真实的 65579 行表和所涉及的三列中的每一列上的单列 btree 索引上运行了三个测试,并获得了5 次运行的最佳执行时间。
将@OMGPonies 的第一个查询 ( A
) 与上述DISTINCT ON
解决方案( B
) 进行比较:
Select the whole table, results in 5958 rows in this case.
A: 567.218 ms B: 386.673 ms
Use condition
WHERE customer BETWEEN x AND y
resulting in 1000 rows.A: 249.136 ms B: 55.111 ms
Select a single customer with
WHERE customer = x
.A: 0.143 ms B: 0.072 ms
选择整个表,在这种情况下会产生 5958 行。
A: 567.218 ms B: 386.673 ms
使用条件
WHERE customer BETWEEN x AND y
产生 1000 行。A: 249.136 ms B: 55.111 ms
选择一个带有
WHERE customer = x
.A: 0.143 ms B: 0.072 ms
Same test repeated with the index described in the other answer
使用另一个答案中描述的索引重复相同的测试
CREATE INDEX purchases_3c_idx ON purchases (customer, total DESC, id);
1A: 277.953 ms
1B: 193.547 ms
2A: 249.796 ms -- special index not used
2B: 28.679 ms
3A: 0.120 ms
3B: 0.048 ms
回答by TMS
This is common greatest-n-per-groupproblem, which already has well tested and highly optimized solutions. Personally I prefer the left join solution by Bill Karwin(the original post with lots of other solutions).
这是常见的每组最大 n 个问题,已经有经过充分测试和高度优化的解决方案。就我个人而言,我更喜欢Bill Karwin的left join 解决方案(包含许多其他解决方案的原始帖子)。
Note that bunch of solutions to this common problem can surprisingly be found in the one of most official sources, MySQL manual! See Examples of Common Queries :: The Rows Holding the Group-wise Maximum of a Certain Column.
请注意,在最官方的来源之一MySQL 手册中可以令人惊讶地找到针对此常见问题的大量解决方案!请参阅常见查询示例:: 保留特定列的分组最大值的行。
回答by Paul A Jungwirth
In Postgres you can use array_agg
like this:
在 Postgres 中,您可以这样使用array_agg
:
SELECT customer,
(array_agg(id ORDER BY total DESC))[1],
max(total)
FROM purchases
GROUP BY customer
This will give you the id
of each customer's largest purchase.
这将为您id
提供每个客户的最大购买量。
Some things to note:
一些注意事项:
array_agg
is an aggregate function, so it works withGROUP BY
.array_agg
lets you specify an ordering scoped to just itself, so it doesn't constrain the structure of the whole query. There is also syntax for how you sort NULLs, if you need to do something different from the default.- Once we build the array, we take the first element. (Postgres arrays are 1-indexed, not 0-indexed).
- You could use
array_agg
in a similar way for your third output column, butmax(total)
is simpler. - Unlike
DISTINCT ON
, usingarray_agg
lets you keep yourGROUP BY
, in case you want that for other reasons.
array_agg
是一个聚合函数,所以它适用于GROUP BY
.array_agg
允许您指定范围仅限于自身的排序,因此它不会限制整个查询的结构。如果您需要做一些与默认值不同的事情,还有关于如何对 NULL 进行排序的语法。- 一旦我们构建了数组,我们就取第一个元素。(Postgres 数组是 1 索引的,而不是 0 索引的)。
- 您可以
array_agg
以类似的方式使用第三个输出列,但max(total)
更简单。 - 与 不同
DISTINCT ON
, usingarray_agg
可以让您保留您的GROUP BY
,以防您出于其他原因需要。
回答by user2407394
The solution is not very efficient as pointed by Erwin, because of presence of SubQs
正如 Erwin 所指出的,该解决方案不是很有效,因为存在 SubQ
select * from purchases p1 where total in
(select max(total) from purchases where p1.customer=customer) order by total desc;
回答by matiu
I use this way (postgresql only): https://wiki.postgresql.org/wiki/First/last_%28aggregate%29
我使用这种方式(仅 postgresql):https: //wiki.postgresql.org/wiki/First/last_%28aggregate%29
-- Create a function that always returns the first non-NULL item
CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
SELECT ;
$$;
-- And then wrap an aggregate around it
CREATE AGGREGATE public.first (
sfunc = public.first_agg,
basetype = anyelement,
stype = anyelement
);
-- Create a function that always returns the last non-NULL item
CREATE OR REPLACE FUNCTION public.last_agg ( anyelement, anyelement )
RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
SELECT ;
$$;
-- And then wrap an aggregate around it
CREATE AGGREGATE public.last (
sfunc = public.last_agg,
basetype = anyelement,
stype = anyelement
);
Then your example should work almostas is:
那么您的示例应该几乎按原样工作:
SELECT FIRST(id), customer, FIRST(total)
FROM purchases
GROUP BY customer
ORDER BY FIRST(total) DESC;
CAVEAT: It ignore's NULL rows
CAVEAT:它忽略 NULL 行
Edit 1 - Use the postgres extension instead
编辑 1 - 改用 postgres 扩展
Now I use this way: http://pgxn.org/dist/first_last_agg/
现在我用这种方式:http: //pgxn.org/dist/first_last_agg/
To install on ubuntu 14.04:
在 ubuntu 14.04 上安装:
apt-get install postgresql-server-dev-9.3 git build-essential -y
git clone git://github.com/wulczer/first_last_agg.git
cd first_last_app
make && sudo make install
psql -c 'create extension first_last_agg'
It's a postgres extension that gives you first and last functions; apparently faster than the above way.
它是一个 postgres 扩展,为您提供 first 和 last 功能;显然比上述方式更快。
Edit 2 - Ordering and filtering
编辑 2 - 排序和过滤
If you use aggregate functions (like these), you can order the results, without the need to have the data already ordered:
如果您使用聚合函数(像这些),您可以对结果进行排序,而无需对数据进行排序:
http://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES
So the equivalent example, with ordering would be something like:
所以等效的例子,排序将是这样的:
SELECT first(id order by id), customer, first(total order by id)
FROM purchases
GROUP BY customer
ORDER BY first(total);
Of course you can order and filter as you deem fit within the aggregate; it's very powerful syntax.
当然,您可以按照您认为适合的方式进行排序和过滤;这是非常强大的语法。
回答by khaled_gomaa
The Query:
查询:
SELECT purchases.*
FROM purchases
LEFT JOIN purchases as p
ON
p.customer = purchases.customer
AND
purchases.total < p.total
WHERE p.total IS NULL
HOW DOES THAT WORK!(I've been there)
这是如何运作的!(我去过那儿)
We want to make sure that we only have the highest total for each purchase.
我们希望确保每次购买的总金额最高。
Some Theoretical Stuff(skip this part if you only want to understand the query)
一些理论知识(如果您只想了解查询,请跳过此部分)
Let Total be a function T(customer,id) where it returns a value given the name and id To prove that the given total (T(customer,id)) is the highest we have to prove that We want to prove either
设 Total 是一个函数 T(customer,id),它返回一个给定名称和 id 的值为了证明给定的总数 (T(customer,id)) 是我们必须证明的最高值 我们想要证明
- ?x T(customer,id) > T(customer,x) (this total is higher than all other total for that customer)
- ?x T(customer,id) > T(customer,x)(此总数高于该客户的所有其他总数)
OR
或者
- ??x T(customer, id) < T(customer, x) (there exists no higher total for that customer)
- ??x T(customer, id) < T(customer, x)(该客户不存在更高的总数)
The first approach will need us to get all the records for that name which I do not really like.
第一种方法需要我们获取我不喜欢的那个名字的所有记录。
The second one will need a smart way to say there can be no record higher than this one.
第二个需要一种聪明的方式来说明没有比这个更高的记录。
Back to SQL
回到 SQL
If we left joins the table on the name and total being less than the joined table:
如果我们在名称上留下连接表,并且总数小于连接表:
LEFT JOIN purchases as p
ON
p.customer = purchases.customer
AND
purchases.total < p.total
we make sure that all records that have another record with the higher total for the same user to be joined:
我们确保所有具有另一条记录且同一用户的总数较高的记录都被加入:
purchases.id, purchases.customer, purchases.total, p.id, p.customer, p.total
1 , Tom , 200 , 2 , Tom , 300
2 , Tom , 300
3 , Bob , 400 , 4 , Bob , 500
4 , Bob , 500
5 , Alice , 600 , 6 , Alice , 700
6 , Alice , 700
That will help us filter for the highest total for each purchase with no grouping needed:
这将帮助我们过滤每次购买的最高总数,而无需分组:
WHERE p.total IS NULL
purchases.id, purchases.name, purchases.total, p.id, p.name, p.total
2 , Tom , 300
4 , Bob , 500
6 , Alice , 700
And that's the answer we need.
这就是我们需要的答案。
回答by Alejandro Salamanca Mazuelo
Very fast solution
非常快速的解决方案
SELECT a.*
FROM
purchases a
JOIN (
SELECT customer, min( id ) as id
FROM purchases
GROUP BY customer
) b USING ( id );
and really very fast if table is indexed by id:
如果表按 id 索引,速度真的非常快:
create index purchases_id on purchases (id);
回答by Diwas Poudel
In SQL Server you can do this:
在 SQL Server 中,您可以这样做:
SELECT *
FROM (
SELECT ROW_NUMBER()
OVER(PARTITION BY customer
ORDER BY total DESC) AS StRank, *
FROM Purchases) n
WHERE StRank = 1
Explaination:Here Group byis done on the basis of customer and then order it by total then each such group is given serial number as StRank and we are taking out first 1 customer whose StRank is 1
说明:这里 按客户分组,然后按总数排序,然后每个这样的组被赋予序列号作为StRank,我们取出第一个StRank为1的客户