MySQL:选择 N 行,但一列中只有唯一值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/190702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:12:10  来源:igfitidea点击:

MySQL: Select N rows, but with only unique values in one column

sqlmysqluniqueaggregate-functionsmin

提问by BlaM

Given this data set:

鉴于此数据集:

ID  Name            City            Birthyear
1   Egon Spengler   New York        1957
2   Mac Taylor      New York        1955
3   Sarah Connor    Los Angeles     1959
4   Jean-Luc Picard La Barre        2305
5   Ellen Ripley    Nostromo        2092
6   James T. Kirk   Riverside       2233
7   Henry Jones     Chicago         1899

I need to find the 3 oldest persons, but only one of every city.

我需要找到 3 个最年长的人,但每个城市中只有一个。

If it would just be the three oldest, it would be...

如果它只是最古老的三个,那就是......

  • Henry Jones / Chicago
  • Mac Taylor / New York
  • Egon Spengler / New York
  • 亨利·琼斯 / 芝加哥
  • 麦克泰勒 / 纽约
  • 埃贡·斯宾格勒 / 纽约

However since both Egon Spengler and Mac Taylor are located in New York, Egon Spengler would drop out and the next one (Sarah Connor / Los Angeles) would come in instead.

然而,由于 Egon Spengler 和 Mac Taylor 都位于纽约,Egon Spengler 将退出,而下一位(莎拉康纳 / 洛杉矶)会进来。

Any elegant solutions?

任何优雅的解决方案?

Update:

更新:

Currently a variation of PConroy is the best/fastest solution:

目前,PConroy 的变体是最好/最快的解决方案:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

His original query with "IN" is extremly slow with big datasets (aborted after 5 minutes), but moving the subquery to a JOIN will speed it up a lot. It took about 0.15 seconds for approx. 1 mio rows in my test environment. I have an index on "City, Birthyear" and a second one just on "Birthyear".

他使用“IN”的原始查询对于大数据集非常慢(5 分钟后中止),但是将子查询移动到 JOIN 会大大加快速度。大约需要 0.15 秒。我的测试环境中有 1 个 mio 行。我有一个关于“城市,出生年”的索引,还有一个关于“出生年”的索引。

Note: This is related to...

注意:这与...

采纳答案by ConroyP

Probably not the most elegant of solutions, and the performance of INmay suffer on larger tables.

可能不是最优雅的解决方案,并且IN在更大的表上性能可能会受到影响。

The nested query gets the minimum Birthyearfor each city. Only records who have this Birthyearare matched in the outer query. Ordering by age then limiting to 3 results gets you the 3 oldest people who are also the oldest in their city (Egon Spengler drops out..)

嵌套查询获取Birthyear每个城市的最小值。Birthyear在外部查询中只匹配具有此属性的记录。按年龄排序然后限制为 3 个结果,您将获得 3 个最年长的人,他们也是他们城市中最年长的人(Egon Spengler 退学了..)

SELECT Name, City, Birthyear, COUNT(*) AS ct
FROM table
WHERE Birthyear IN (SELECT MIN(Birthyear)
               FROM table
               GROUP by City)
GROUP BY City
ORDER BY Birthyear DESC LIMIT 3;

+-----------------+-------------+------+----+
| name            | city        | year | ct |
+-----------------+-------------+------+----+
| Henry Jones     | Chicago     | 1899 | 1  |
| Mac Taylor      | New York    | 1955 | 1  |
| Sarah Connor    | Los Angeles | 1959 | 1  |
+-----------------+-------------+------+----+

Edit- added GROUP BY Cityto outer query, as people with same birth years would return multiple values. Grouping on the outer query ensures that only one result will be returned per city, if more than one person has that minimum Birthyear. The ctcolumn will show if more than one person exists in the city with that Birthyear

编辑- 添加GROUP BY City到外部查询中,因为出生年份相同的人会返回多个值。对外部查询进行分组可确保每个城市仅返回一个结果,如果超过一个人具有该最小值Birthyear。该ct列将显示该城市中是否存在不止一个人Birthyear

回答by Tamas Czinege

This is probably not the most elegant and quickest solution, but it should work. I am looking forward the see the solutions of real database gurus.

这可能不是最优雅和最快的解决方案,但它应该有效。我期待看到真正的数据库大师的解决方案。

select p.* from people p,
(select city, max(age) as mage from people group by city) t
where p.city = t.city and p.age = t.mage
order by p.age desc

回答by Tomalak

Something like that?

类似的东西?

SELECT
  Id, Name, City, Birthyear
FROM
  TheTable
WHERE
  Id IN (SELECT TOP 1 Id FROM TheTable i WHERE i.City = TheTable.City ORDER BY Birthyear)

回答by kristof

Not pretty but should work also with multiple people with the same dob:

不漂亮,但也应该与多个具有相同 dob 的人一起工作:

Test data:

测试数据:

select id, name, city, dob 
into people
from
(select 1 id,'Egon Spengler' name, 'New York' city , 1957 dob
union all select 2, 'Mac Taylor','New York', 1955
union all select 3, 'Sarah Connor','Los Angeles', 1959
union all select 4, 'Jean-Luc Picard','La Barre', 2305
union all select 5, 'Ellen Ripley','Nostromo', 2092
union all select 6, 'James T. Kirk','Riverside', 2233
union all select 7, 'Henry Jones','Chicago', 1899
union all select 8, 'Blah','New York', 1955) a

Query:

询问:

select 
    * 
from 
    people p
    left join people p1
    ON 
        p.city = p1.city
        and (p.dob > p1.dob and p.id <> p1.id)
        or (p.dob = p1.dob and p.id > p1.id)
where
    p1.id is null
order by 
    p.dob

回答by gondo

@BlaM

@责备

UPDATEDjust found that its good to use USING instead of ON. it will remove duplicate columns in result.

UPDATED刚刚发现使用 USING 而不是 ON 很好。它将删除结果中的重复列。

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 USING(Birthyear, City)
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

ORIGINAL POST

原帖

hi, i've tried to use your updated query but i was getting wrong results until i've added extra condition to join (also extra column into join select). transfered to your query, i'am using this:

嗨,我已经尝试使用您更新的查询,但我得到了错误的结果,直到我添加了额外的条件来加入(也在加入选择中添加了额外的列)。转移到您的查询,我正在使用这个:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear AND P2.City = P.City
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

in theory you should not need last GROUP BY P.City, but i've left it there for now, just in case. will probably remove it later.

理论上你不应该需要最后一个 GROUP BY P.City,但我暂时把它留在那里,以防万一。稍后可能会删除它。