MySQL 获取最后一组不同的记录

Question

提问by Obinwanne Hill

I have a database table containing the following columns:

我有一个包含以下列的数据库表：

id   code   value   datetime   timestamp

In this table the only unique values reside in id i.e. primary key.

在这个表中，唯一的唯一值存在于 id 中，即主键。

I want to retrieve the last distinct set of records in this table based on the datetime value. For example, let's say below is my table

我想根据日期时间值检索此表中最后一组不同的记录。例如，假设下面是我的桌子

id   code   value   datetime               timestamp
1    1023   23.56   2011-04-05 14:54:52    1234223421
2    1024   23.56   2011-04-05 14:55:52    1234223423
3    1025   23.56   2011-04-05 14:56:52    1234223424
4    1023   23.56   2011-04-05 14:57:52    1234223425
5    1025   23.56   2011-04-05 14:58:52    1234223426
6    1025   23.56   2011-04-05 14:59:52    1234223427
7    1024   23.56   2011-04-05 15:00:12    1234223428
8    1026   23.56   2011-04-05 15:01:14    1234223429
9    1025   23.56   2011-04-05 15:02:22    1234223430

I want to retrieve the records with IDs 4, 7, 8, and 9 i.e. the last set of records with distinct codes (based on datetime value). What I have highlighted is simply an example of what I'm trying to achieve, as this table is going to eventually contain millions of records, and hundreds of individual code values.

我想检索 ID 为 4、7、8 和 9 的记录，即具有不同代码（基于日期时间值）的最后一组记录。我强调的只是我试图实现的一个示例，因为该表最终将包含数百万条记录和数百个单独的代码值。

What SQL statement can I use to achieve this? I can't seem to get it done with a single SQL statement. My database is MySQL 5.

我可以使用什么 SQL 语句来实现这一点？我似乎无法使用单个 SQL 语句完成它。我的数据库是 MySQL 5。

Answer 1

回答by smdrager

This should work for you.

这应该对你有用。

 SELECT * 
 FROM [tableName] 
 WHERE id IN (SELECT MAX(id) FROM [tableName] GROUP BY code)

If id is AUTO_INCREMENT, there's no need to worry about the datetime which is far more expensive to compute, as the most recent datetime will also have the highest id.

如果 id 是 AUTO_INCREMENT，则无需担心计算成本要高得多的日期时间，因为最近的日期时间也将具有最高的 id。

Update:From a performance standpoint, make sure the idand codecolumns are indexed when dealing with a large number of records. If idis the primary key, this is built in, but you may need to add a non-clustered index covering codeand id.

更新：从性能的角度来看，确保在处理大量记录时对id和code列进行索引。如果id是主键，这是内置的，但您可能需要添加一个非聚集索引覆盖code和id。

Answer 2

回答by Chandu

Try this:

尝试这个：

SELECT * 
  FROM <YOUR_TABLE>
 WHERE (code, datetime, timestamp) IN
 (
   SELECT code, MAX(datetime), MAX(timestamp)
     FROM <YOUR_TABLE>
    GROUP BY code
 )

Answer 3

回答by educolo

It's and old post, but testing @smdrager answer with large tables was very slow. My fix to this was using "inner join" instead of "where in".

这是旧帖子，但是用大表测试@smdrager 答案非常慢。我对此的解决方法是使用“内部连接”而不是“在何处”。

SELECT * 
 FROM [tableName] as t1
 INNER JOIN (SELECT MAX(id) as id FROM [tableName] GROUP BY code) as t2
 ON t1.id = t2.id

This worked really fast.

这工作得非常快。

Answer 4

回答by krtek

I'll try something like this :

我会尝试这样的事情：

select * from table
where id in (
    select id
    from table
    group by code
    having datetime = max(datetime)
)

(disclaimer: this is not tested)

（免责声明：这未经测试）

If the row with the bigger datetime also have the bigger id, the solution proposed by smdrager is quicker.

如果datetime大的那一行id也大，smdrager提出的解决方案会更快。

Answer 5

回答by The Godfather

Looks like all existing answers suggest to do GROUP BY codeon the whole table. When it's logically correct, in reality this query will go through the whole(!) table (use EXPLAINto make sure). In my case, I have less than 500k of rows in the table and executing ...GROUP BY codetakes 0.3 seconds which is absolutely not acceptable.

看起来所有现有的答案都建议GROUP BY code在整张桌子上做。当它在逻辑上正确时，实际上这个查询将遍历整个（！）表（用于EXPLAIN确保）。就我而言，表中的行数少于 500k，执行...GROUP BY code时间为 0.3 秒，这是绝对不可接受的。

However I can use knowledge of my data here (read as "show last comments for posts"):

但是，我可以在这里使用我的数据知识（读作“显示帖子的最后评论”）：

I need to select just top-20 records
Amount of records with same code across last X records is relatively small (~uniform distribution of comments across posts, there are no "viral" post which got all the recent comments)
Total amount of records >> amount of available code's >> amount of "top" records you want to get

我只需要选择前 20 条记录
最后 X 条记录中具有相同代码的记录数量相对较少（〜帖子之间的评论分布均匀，没有“病毒”帖子获得所有最近的评论）
记录总数>>可用数量code>>您想要获得的“顶级”记录数量

By experimenting with numbers I found out that I can always find 20 different codeif I select just last 50 records. And in this case following query works (keeping in mind @smdrager comment about high probability to use idinstead of datetime)

通过对数字进行试验，我发现code如果我只选择最后 50 条记录，我总能找到 20 条不同的记录。在这种情况下，以下查询有效（请记住 @smdrager 关于使用高概率id而不是的评论datetime）

SELECT id, code
FROM tablename
ORDER BY id DESC 
LIMIT 50

Selecting just last 50 entries is super quick, because it doesn't need to check the whole table. And the rest is to select top-20 with distinct codeout of those 50 entries.

只选择最后 50 个条目非常快，因为它不需要检查整个表。剩下的就是code从这 50 个条目中选出前 20名。

Obviously, queries on the set of 50 (100, 500) elements are significantly faster than on the whole table with hundreds of thousands entries.

显然，对 50 (100, 500) 个元素的集合的查询比在具有数十万个条目的整个表上查询要快得多。

Raw SQL "Postprocessing"

原始 SQL“后处理”

SELECT MAX(id) as id, code FROM 
    (SELECT id, code
     FROM tablename
     ORDER BY id DESC 
     LIMIT 50) AS nested 
GROUP BY code
ORDER BY id DESC 
LIMIT 20

This will give you list of id's really quick and if you want to perform additional JOINs, put this query as yet another nested query and perform all joins on it.

这将为您提供id非常快速的's列表，如果您想执行额外的 JOIN，请将此查询作为另一个嵌套查询并在其上执行所有连接。

Backend-side "Postprocessing"

后端“后处理”

And after that you need to process the data in your programming language to include to the final set only the records with distinct code.

之后，您需要用您的编程语言处理数据，以将仅具有 distinct 的记录包含到最终集合中code。

Some kind of Python pseudocode:

某种Python伪代码：

records = select_simple_top_records(50)
added_codes = set()
top_records = []
for record in records:
    # If record for this code was already found before
    # Note: this is not optimal, better to use structure allowing O(1) search and insert
    if record['code'] in added_codes:
        continue
    # Save record
    top_records.append(record)
    added_codes.add(record['code'])
    # If we found all top-20 required, finish
    if len(top_records) >= 20:
        break

MySQL 获取最后一组不同的记录

提问by Obinwanne Hill

回答by smdrager

回答by Chandu

回答by educolo

回答by krtek

回答by The Godfather

相关推荐

最近更新

标签

MySQL 获取最后一组不同的记录

提问by Obinwanne Hill

回答by smdrager

回答by Chandu

回答by educolo

回答by krtek

回答by The Godfather

相关推荐

如何以永久的方式对 MYSQL 表进行排序？

更改 MySQL 中的连接时区

MySQL：选择前 n 个最大值？

如何修复 mysql 中的 ERROR 29 (HY000)？

相关推荐

最近更新

标签