SQL 替代使用没有聚合的 GROUP BY 来检索不同的“最佳”结果

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4710406/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 08:54:16  来源:igfitidea点击:

Alternative to using GROUP BY without aggregates to retrieve distinct "best" result

sqlgroup-byaggregation

提问by Tyris

I'm trying to retrieve the "Best" possible entry from an SQL table.

我正在尝试从 SQL 表中检索“最佳”可能的条目。

Consider a table containing tv shows: id, title, episode, is_hidef, is_verified eg:

考虑一个包含电视节目的表:id、title、episode、is_hidef、is_verified 例如:

id title         ep hidef verified
1  The Simpsons  1  True  False
2  The Simpsons  1  True  True
3  The Simpsons  1  True  True
4  The Simpsons  2  False False
5  The Simpsons  2  True  False

There may be duplicate rows for a single title and episode which may or may not have different values for the boolean fields. There may be more columns containing additional info, but thats unimportant.

单个标题和剧集可能有重复的行,它们的布尔字段值可能不同,也可能不同。可能有更多包含附加信息的列,但这并不重要。

I want a result set that gives me the best row (so is_hidef and is_verified are both "true" where possible) for each episode. For rows considered "equal" I want the most recent row (natural ordering, or order by an abitrary datetime column).

我想要一个结果集,为每集提供最好的行(因此 is_hidef 和 is_verified 在可能的情况下都是“true”)。对于被认为“相等”的行,我想要最近的行(自然排序,或按任意日期时间列排序)。

3  The Simpsons  1  True  True
5  The Simpsons  2  True  False

In the past I would have used the following query:

在过去,我会使用以下查询:

SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified

This works under MySQL and SQLite, but goes against the SQL spec (GROUP BY requiring aggragates etc etc). I'm not really interested in hearing again why MySQL is so bad for allowing this; but I'm very interested in finding an alternative solution that will work on other engines too (bonus points if you can give me the django ORM code for it).

这适用于 MySQL 和 SQLite,但违反 SQL 规范(GROUP BY 需要聚合等)。我真的没有兴趣再次听到为什么 MySQL 允许这样做如此糟糕;但我很想找到一个也可以在其他引擎上运行的替代解决方案(如果你能给我 django ORM 代码的话,可以加分)。

Thanks =)

谢谢=)

采纳答案by Andomar

This is basically a form of the groupwise-maximum-with-ties problem. I don't think there is a SQL standard compliant solution. A solution like this would perform nicely:

这基本上是groupwise-maximum-with-ties 问题的一种形式。我认为没有符合 SQL 标准的解决方案。像这样的解决方案会很好地执行:

SELECT  s2.id
,       s2.title
,       s2.episode
,       s2.is_hidef
,       s2.is_verified
FROM    (
        select  distinct title
        ,       episode
        from    shows
        where   title = 'The Simpsons' 
        ) s1
JOIN    shows s2
ON      s2.id = 
        (
        select  id
        from    shows s3
        where   s3.title = s1.title
                and s3.episode = s1.episode
        order by
                s3.is_hidef DESC
        ,       s3.is_verified DESC
        limit   1
        )

But given the cost of readability, I would stick with your original query.

但是考虑到可读性的成本,我会坚持使用您的原始查询。

回答by RichardTheKiwi

In some way similar to Andomar's but this one really works.

在某种程度上类似于 Andomar 的,但这个确实有效。

select C.*
FROM
(
    select min(ID) minid
    from (
        select distinct title, ep, max(hidef*1 + verified*1) ord
        from tbl
        group by title, ep) a
    inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
    group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id

The first level tiebreak converts bits (SQL Server) or MySQL boolean to an integer value using *1, and the columns are added to produce the "best" value. You can give them weights, e.g. if hidef > verified, then use hidef*2 + verified*1which can produce 3,2,1 or 0.

第一级决胜局使用 *1 将位 (SQL Server) 或 MySQL 布尔值转换为整数值,并添加列以生成“最佳”值。你可以给他们权重,例如如果 hidef > 验证,然后使用hidef*2 + 验证*1,它可以产生 3,2,1 或 0。

The 2nd level looks among those of the "best" scenario and extracts the minimum ID (or some other tie-break column). This is essential to reduce a multi-match result set to just one record.

第二级在“最佳”场景中查找并提取最小 ID(或其他一些抢七列)。这对于将多匹配结果集减少到只有一条记录至关重要。

In this particular case (table schema), the outer select uses the direct key to retrieve the matched records.

在这种特殊情况下(表模式),外部选择使用直接键来检索匹配的记录。