MySQL 获取最后一组不同的记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5554075/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get last distinct set of records
提问by Obinwanne Hill
I have a database table containing the following columns:
我有一个包含以下列的数据库表:
id code value datetime timestamp
In this table the only unique values reside in id i.e. primary key.
在这个表中,唯一的唯一值存在于 id 中,即主键。
I want to retrieve the last distinct set of records in this table based on the datetime value. For example, let's say below is my table
我想根据日期时间值检索此表中最后一组不同的记录。例如,假设下面是我的桌子
id code value datetime timestamp
1 1023 23.56 2011-04-05 14:54:52 1234223421
2 1024 23.56 2011-04-05 14:55:52 1234223423
3 1025 23.56 2011-04-05 14:56:52 1234223424
4 1023 23.56 2011-04-05 14:57:52 1234223425
5 1025 23.56 2011-04-05 14:58:52 1234223426
6 1025 23.56 2011-04-05 14:59:52 1234223427
7 1024 23.56 2011-04-05 15:00:12 1234223428
8 1026 23.56 2011-04-05 15:01:14 1234223429
9 1025 23.56 2011-04-05 15:02:22 1234223430
I want to retrieve the records with IDs 4, 7, 8, and 9 i.e. the last set of records with distinct codes (based on datetime value). What I have highlighted is simply an example of what I'm trying to achieve, as this table is going to eventually contain millions of records, and hundreds of individual code values.
我想检索 ID 为 4、7、8 和 9 的记录,即具有不同代码(基于日期时间值)的最后一组记录。我强调的只是我试图实现的一个示例,因为该表最终将包含数百万条记录和数百个单独的代码值。
What SQL statement can I use to achieve this? I can't seem to get it done with a single SQL statement. My database is MySQL 5.
我可以使用什么 SQL 语句来实现这一点?我似乎无法使用单个 SQL 语句完成它。我的数据库是 MySQL 5。
回答by smdrager
This should work for you.
这应该对你有用。
SELECT *
FROM [tableName]
WHERE id IN (SELECT MAX(id) FROM [tableName] GROUP BY code)
If id is AUTO_INCREMENT, there's no need to worry about the datetime which is far more expensive to compute, as the most recent datetime will also have the highest id.
如果 id 是 AUTO_INCREMENT,则无需担心计算成本要高得多的日期时间,因为最近的日期时间也将具有最高的 id。
Update:From a performance standpoint, make sure the id
and code
columns are indexed when dealing with a large number of records. If id
is the primary key, this is built in, but you may need to add a non-clustered index covering code
and id
.
更新:从性能的角度来看,确保在处理大量记录时对id
和code
列进行索引。如果id
是主键,这是内置的,但您可能需要添加一个非聚集索引覆盖code
和id
。
回答by Chandu
Try this:
尝试这个:
SELECT *
FROM <YOUR_TABLE>
WHERE (code, datetime, timestamp) IN
(
SELECT code, MAX(datetime), MAX(timestamp)
FROM <YOUR_TABLE>
GROUP BY code
)
回答by educolo
It's and old post, but testing @smdrager answer with large tables was very slow. My fix to this was using "inner join" instead of "where in".
这是旧帖子,但是用大表测试@smdrager 答案非常慢。我对此的解决方法是使用“内部连接”而不是“在何处”。
SELECT *
FROM [tableName] as t1
INNER JOIN (SELECT MAX(id) as id FROM [tableName] GROUP BY code) as t2
ON t1.id = t2.id
This worked really fast.
这工作得非常快。
回答by krtek
I'll try something like this :
我会尝试这样的事情:
select * from table
where id in (
select id
from table
group by code
having datetime = max(datetime)
)
(disclaimer: this is not tested)
(免责声明:这未经测试)
If the row with the bigger datetime also have the bigger id, the solution proposed by smdrager is quicker.
如果datetime大的那一行id也大,smdrager提出的解决方案会更快。
回答by The Godfather
Looks like all existing answers suggest to do GROUP BY code
on the whole table. When it's logically correct, in reality this query will go through the whole(!) table (use EXPLAIN
to make sure). In my case, I have less than 500k of rows in the table and executing ...GROUP BY code
takes 0.3 seconds which is absolutely not acceptable.
看起来所有现有的答案都建议GROUP BY code
在整张桌子上做。当它在逻辑上正确时,实际上这个查询将遍历整个(!)表(用于EXPLAIN
确保)。就我而言,表中的行数少于 500k,执行...GROUP BY code
时间为 0.3 秒,这是绝对不可接受的。
However I can use knowledge of my data here (read as "show last comments for posts"):
但是,我可以在这里使用我的数据知识(读作“显示帖子的最后评论”):
- I need to select just top-20 records
- Amount of records with same code across last X records is relatively small (~uniform distribution of comments across posts, there are no "viral" post which got all the recent comments)
- Total amount of records >> amount of available
code
's >> amount of "top" records you want to get
- 我只需要选择前 20 条记录
- 最后 X 条记录中具有相同代码的记录数量相对较少(〜帖子之间的评论分布均匀,没有“病毒”帖子获得所有最近的评论)
- 记录总数>>可用数量
code
>>您想要获得的“顶级”记录数量
By experimenting with numbers I found out that I can always find 20 different code
if I select just last 50 records. And in this case following query works (keeping in mind @smdrager comment about high probability to use id
instead of datetime
)
通过对数字进行试验,我发现code
如果我只选择最后 50 条记录,我总能找到 20 条不同的记录。在这种情况下,以下查询有效(请记住 @smdrager 关于使用高概率id
而不是 的评论datetime
)
SELECT id, code
FROM tablename
ORDER BY id DESC
LIMIT 50
Selecting just last 50 entries is super quick, because it doesn't need to check the whole table. And the rest is to select top-20 with distinct code
out of those 50 entries.
只选择最后 50 个条目非常快,因为它不需要检查整个表。剩下的就是code
从这 50 个条目中选出前 20名。
Obviously, queries on the set of 50 (100, 500) elements are significantly faster than on the whole table with hundreds of thousands entries.
显然,对 50 (100, 500) 个元素的集合的查询比在具有数十万个条目的整个表上查询要快得多。
Raw SQL "Postprocessing"
原始 SQL“后处理”
SELECT MAX(id) as id, code FROM
(SELECT id, code
FROM tablename
ORDER BY id DESC
LIMIT 50) AS nested
GROUP BY code
ORDER BY id DESC
LIMIT 20
This will give you list of id
's really quick and if you want to perform additional JOINs, put this query as yet another nested query and perform all joins on it.
这将为您提供id
非常快速的's列表,如果您想执行额外的 JOIN,请将此查询作为另一个嵌套查询并在其上执行所有连接。
Backend-side "Postprocessing"
后端“后处理”
And after that you need to process the data in your programming language to include to the final set only the records with distinct code
.
之后,您需要用您的编程语言处理数据,以将仅具有 distinct 的记录包含到最终集合中code
。
Some kind of Python pseudocode:
某种Python伪代码:
records = select_simple_top_records(50)
added_codes = set()
top_records = []
for record in records:
# If record for this code was already found before
# Note: this is not optimal, better to use structure allowing O(1) search and insert
if record['code'] in added_codes:
continue
# Save record
top_records.append(record)
added_codes.add(record['code'])
# If we found all top-20 required, finish
if len(top_records) >= 20:
break