postgresql 试图找到列中的第二大值(postgres sql)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4910930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Trying to find the second largest value in a column (postgres sql)
提问by seb
I am trying to find the second largest value in a column and only the second largest value.
我试图找到列中的第二大值,并且只找到第二大值。
select a.name, max(a.word) as word
from apple a
where a.word < (select max(a.word) from apple a)
group by a.name;
For some reason, what I have now returns the second largest value AND all the lower values also but fortunately avoids the largest value.
出于某种原因,我现在所拥有的返回第二大值和所有较低的值,但幸运的是避免了最大值。
Is there a way to fix this?
有没有办法来解决这个问题?
回答by seb
Here's another conceptually simple solution that's been running for me in .1 milliseconds on a table of 21 million rows, according to EXPLAIN ANALYZE. It returns nothing in the case where there's only one value.
根据 EXPLAIN ANALYZE 的说法,这是另一个概念上简单的解决方案,它在 0.1 毫秒内在 2100 万行的表上为我运行。在只有一个值的情况下,它不返回任何内容。
SELECT a.name,
(SELECT word FROM apple ap WHERE ap.name=a.name ORDER BY word ASC OFFSET 1 LIMIT 1)
FROM apple a
Note that my table already had existing indices on name, word, and (name, word), which allows me to use ORDER BY like that.
请注意,我的表已经有关于 name、word 和 (name, word) 的现有索引,这允许我像这样使用 ORDER BY。
回答by Michael Buen
The simplest, albeit inefficient(array can exhaust memory):
最简单但效率低下的(数组会耗尽内存):
select student, (array_agg(grade order by grade desc))[2]
from
student_grades
group by student
The efficient one:
高效的一个:
create aggregate two_elements(anyelement)
(
sfunc = array_limit_two,
stype = anyarray,
initcond = '{}'
);
create or replace function array_limit_two(anyarray, anyelement) returns anyarray
as
$$
begin
if array_upper(,1) = 2 then
return ;
else
return array_append(, );
end if;
end;
$$ language 'plpgsql';
Test data:
测试数据:
create table student_grades
(
student text,
grade int
);
insert into student_grades values
('john',70),
('john',80),
('john',90),
('john',100);
insert into student_grades values
('paul',20),
('paul',10),
('paul',50),
('paul',30);
insert into student_grades values
('george',40);
Test code:
测试代码:
-- second largest
select student, coalesce( (two_elements(grade order by grade desc))[2], max(grade) /* min would do too, since it's one element only */ )
from
student_grades
group by student
-- second smallest
select student, coalesce( (two_elements(grade order by grade))[2], max(grade) /* min would do too, since it's one element only */ )
from
student_grades
group by student
Output:
输出:
q_and_a=# -- second largest
q_and_a=# select student, coalesce( (two_elements(grade order by grade desc))[2], max(grade) /* min would do too, since it's one element only */ )
q_and_a-# from
q_and_a-# student_grades
q_and_a-# group by student;
student | coalesce
---------+----------
george | 40
john | 90
paul | 30
(3 rows)
q_and_a=#
q_and_a=# -- second smallest
q_and_a=# select student, coalesce( (two_elements(grade order by grade))[2], max(grade) /* min would do too, since it's one element only */ )
q_and_a-# from
q_and_a-# student_grades
q_and_a-# group by student;
student | coalesce
---------+----------
george | 40
john | 80
paul | 20
(3 rows)
EDIT@diesel The simplest(and efficient also):
编辑@diesel 最简单(也有效):
-- second largest
select student, array_min(two_elements(grade order by grade desc))
from
student_grades
group by student;
-- second smallest
select student, array_max(two_elements(grade order by grade))
from
student_grades
group by student;
The array_max function:
array_max 函数:
create or replace function array_min(anyarray) returns anyelement
as
$$
select min(unnested) from( select unnest() unnested ) as x
$$ language sql;
create or replace function array_max(anyarray) returns anyelement
as
$$
select max(unnested) from( select unnest() unnested ) as x
$$ language sql;
EDIT
编辑
Could be the simplest and efficient of all, if only Postgresql would make array_max a built-in function and facilitates LIMIT clause on aggregations :-) LIMIT clause on aggregation is my dream feature on Postgresql
可能是最简单和有效的,如果只有 Postgresql 将 array_max 变成一个内置函数并促进聚合上的 LIMIT 子句:-) 聚合上的 LIMIT 子句是我在 Postgresql 上的梦想功能
select student, array_max( array_agg(grade order by grade limit 2) )
from
student_grades
group by student;
While that LIMIT on aggregation is not yet available, use this:
虽然聚合的 LIMIT 尚不可用,但请使用:
-- second largest
select student,
array_min
(
array (
select grade from student_grades
where student = x.student order by grade desc limit 2 )
)
from
student_grades x
group by student;
-- second smallest
select student,
array_max
(
array (
select grade from student_grades
where student = x.student order by grade limit 2 )
)
from
student_grades x
group by student;
回答by Ken Downs
This is also brute force, but is guaranteed to only pass the table exactly and only once:
这也是蛮力,但保证只准确地传递表一次:
select name,word
from (
select name,word
, row_number() over (partition by name
order by word desc)
as rowNum
from apple
) x
where rowNum = 2
This version below may perform better if you have a covering index on (name,word) and there is a high count of word values per name:
如果您在 (name,word) 上有覆盖索引并且每个名称的单词值计数很高,则此版本的性能可能会更好:
with recursive myCte as
(
select name,max(word) as word
, 1 as rowNum
from apple
group by name
union all
select par.name
, (select max(word) as word
from apple
where name = par.name
AND word < par.word
) as word
, 2 as rowNum
from myCte par
where par.rowNum = 1
)
select * from myCte where rownum = 2
回答by a_horse_with_no_name
SELECT * FROM ( SELEC name, dense_rank() over (partition by name order by word desc) as word_rank, count(*) over (partition by name) as name_count FROM apple ) t WHERE (word_rank = 2 OR name_count = 1)
Edit:
The name_count = 1
takes care of those cases where only a single row is present for a specific name.
编辑:
该name_count = 1
负责的那些情况下,只有一行存在特定名称。
Using dense_rank()
instead of rank()
makes sure there isa row with word_rank = 2 as dense_rank makes sure there are no gaps
使用dense_rank()
的,而不是rank()
确保有是有word_rank行= 2的DENSE_RANK确保不存在任何差距
回答by ijw
Umm, you don't just mean:
嗯,你不仅仅是指:
select a.name, max(a.word) as word
from apple a
where a.word < (select max(b.word) from apple b WHERE a.name = b.name)
group by a.name;
do you? One row per name returning the second highest value per name (or no row if there is no second highest value).
你?每个名称一行返回每个名称的第二大值(如果没有第二大值,则没有行)。
If that's what you want, your query was just missing a constraint, although I suspect the above is probably two table scans if PostgreSQL has the sense to convert it to a JOIN.
如果这就是您想要的,那么您的查询只是缺少一个约束,尽管我怀疑如果 PostgreSQL 有将其转换为 JOIN 的意义,上述内容可能是两个表扫描。
回答by Michael Buen
Another approach, use RANK:
另一种方法,使用 RANK:
with ranking as
(
select student, grade, rank() over(partition by student order by grade desc) as place
from
student_grades
)
select *
from
ranking
where
(student, place)
in
(
select student, max(place)
from ranking
where place <= 2
group by student
)
Second to the MIN:
MIN 次之:
with ranking as
(
select student, grade,
rank()
-- just change DESC to ASC
over(partition by student order by grade ASC ) as place
from
student_grades
)
select *
from
ranking
where
(student, place)
in
(
select student, max(place) -- still max
from ranking
where place <= 2
group by student
)
回答by RichardTheKiwi
A very brute force query, but it works
一个非常暴力的查询,但它有效
select a.name, a.word
from apple a
where (select count(distinct b.word) from apple b
where b.word > a.word) = 1