获取 SQL 中另一列的每个值的最常见值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/344665/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get most common value for each value of another column in SQL
提问by Martin C. Martin
I have a table like this:
我有一张这样的表:
Column | Type | Modifiers
---------+------+-----------
country | text |
food_id | int |
eaten | date |
And for each country, I want to get the food that is eaten most often. The best I can think of (I'm using postgres) is:
对于每个国家,我都想得到最常吃的食物。我能想到的最好的(我正在使用 postgres)是:
CREATE TEMP TABLE counts AS
SELECT country, food_id, count(*) as count FROM munch GROUP BY country, food_id;
CREATE TEMP TABLE max_counts AS
SELECT country, max(count) as max_count FROM counts GROUP BY country;
SELECT country, max(food_id) FROM counts
WHERE (country, count) IN (SELECT * from max_counts) GROUP BY country;
In that last statement, the GROUP BY and max() are needed to break ties, where two different foods have the same count.
在最后一条语句中,需要使用 GROUP BY 和 max() 来打破平局,其中两种不同的食物具有相同的数量。
This seems like a lot of work for something conceptually simple. Is there a more straight forward way to do it?
对于概念上简单的东西来说,这似乎是很多工作。有没有更直接的方法来做到这一点?
回答by pilcrow
PostgreSQL introduced support for window functionsin 8.4, the year after this question was asked. It's worth noting that it might be solved today as follows:
PostgreSQL在 8.4 中引入了对窗口函数的支持,也就是提出这个问题的后一年。值得注意的是,今天可能会解决如下:
SELECT country, food_id
FROM (SELECT country, food_id, ROW_NUMBER() OVER (PARTITION BY country ORDER BY freq DESC) AS rn
FROM ( SELECT country, food_id, COUNT('x') AS freq
FROM country_foods
GROUP BY 1, 2) food_freq) ranked_food_req
WHERE rn = 1;
The above will break ties. If you don't want to break ties, you could use DENSE_RANK() instead.
以上将打破联系。如果您不想打破平局,则可以改用 DENSE_RANK()。
回答by jrouquie
It is now even simpler: PostgreSQL 9.4 introduced the mode()
function:
现在更简单了:PostgreSQL 9.4 引入了这个mode()
函数:
select mode() within group (order by food_id)
from munch
group by country
returns (like user2247323's example):
返回(如 user2247323 的示例):
country | mode
--------------
GB | 3
US | 1
See documentation here: https://wiki.postgresql.org/wiki/Aggregate_Mode
请参阅此处的文档:https: //wiki.postgresql.org/wiki/Aggregate_Mode
https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE
https://www.postgresql.org/docs/current/static/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE
回答by jkramer
SELECT DISTINCT
"F1"."food",
"F1"."country"
FROM "foo" "F1"
WHERE
"F1"."food" =
(SELECT "food" FROM
(
SELECT "food", COUNT(*) AS "count"
FROM "foo" "F2"
WHERE "F2"."country" = "F1"."country"
GROUP BY "F2"."food"
ORDER BY "count" DESC
) AS "F5"
LIMIT 1
)
Well, I wrote this in a hurry and didn't check it really well. The sub-select might be pretty slow, but this is shortest and most simple SQL statement that I could think of. I'll probably tell more when I'm less drunk.
好吧,我匆忙写了这个,并没有很好地检查它。子选择可能很慢,但这是我能想到的最短、最简单的 SQL 语句。当我不那么醉时,我可能会说更多。
PS: Oh well, "foo" is the name of my table, "food" contains the name of the food and "country" the name of the country. Sample output:
PS:哦好吧,“foo”是我的表名,“food”包含食物名称,“country”包含国家名称。示例输出:
food | country
-----------+------------
Bratwurst | Germany
Fisch | Frankreich
回答by Jamal Hansen
try this:
尝试这个:
Select Country, Food_id
From Munch T1
Where Food_id=
(Select Food_id
from Munch T2
where T1.Country= T2.Country
group by Food_id
order by count(Food_id) desc
limit 1)
group by Country, Food_id
回答by JCF
Here is a statement which I believe gives you what you want and is simple and concise:
这是一个声明,我相信它可以满足您的需求,并且简单明了:
select distinct on (country) country, food_id
from munch
group by country, food_id
order by country, count(*) desc
Please let me know what you think.
请让我知道你的想法。
BTW, the distinct onfeature is only available in Postgres.
顺便说一句,独特的功能仅在 Postgres 中可用。
Example, source data:
示例,源数据:
country | food_id | eaten
US 1 2017-1-1
US 1 2017-1-1
US 2 2017-1-1
US 3 2017-1-1
GB 3 2017-1-1
GB 3 2017-1-1
GB 2 2017-1-1
output:
输出:
country | food_id
US 1
GB 3
回答by Matt Rogish
SELECT country, MAX( food_id )
FROM( SELECT m1.country, m1.food_id
FROM munch m1
INNER JOIN ( SELECT country
, food_id
, COUNT(*) as food_counts
FROM munch m2
GROUP BY country, food_id ) as m3
ON m1.country = m3.country
GROUP BY m1.country, m1.food_id
HAVING COUNT(*) / COUNT(DISTINCT m3.food_id) = MAX(food_counts) ) AS max_foods
GROUP BY country
I don't like the MAX(.) GROUP BY to break ties... There's gotta be a way to incorporate eaten date into the JOIN in some way to arbitrarily select the most recent one...
我不喜欢 MAX(.) GROUP BY 打破关系......必须有一种方法以某种方式将吃过的日期合并到 JOIN 中以任意选择最新的......
I'm interested on the query plan for this thing if you run it on your live data!
如果您在实时数据上运行它,我对这件事的查询计划很感兴趣!
回答by Theo
select country,food_id, count(*) ne
from food f1
group by country,food_id
having count(*) = (select max(count(*))
from food f2
where country = f1.country
group by food_id)
回答by John MacIntyre
Try something like this
尝试这样的事情
select country, food_id, count(*) cnt
into #tempTbl
from mytable
group by country, food_id
select country, food_id
from #tempTbl as x
where cnt =
(select max(cnt)
from mytable
where country=x.country
and food_id=x.food_id)
This could be put all into a single select, but I don't have time to muck around with it right now.
这可以全部放入一个选择中,但我现在没有时间处理它。
Good luck.
祝你好运。
回答by JosephStyons
Here's how to do it without any temp tables:
这是没有任何临时表的方法:
Edit: simplified
编辑:简化
select nf.country, nf.food_id as most_frequent_food_id
from national_foods nf
group by country, food_id
having
(country,count(*)) in (
select country, max(cnt)
from
(
select country, food_id, count(*) as cnt
from national_foods nf1
group by country, food_id
)
group by country
having country = nf.country
)