postgresql 防止 GROUP BY 中的行重复计算

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12715231/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:23:09  来源:igfitidea点击:

Keeping rows from double-counting in a GROUP BY

sqlpostgresqljoinaggregate-functions

提问by ldrg

Here's the basic guts of my schema and problem: http://sqlfiddle.com/#!1/72ec9/4/0

这是我的架构和问题的基本内容:http: //sqlfiddle.com/#!1/72ec9/4/0

Note that the periods table can refer to a variable range of time - it could be an entire season, it could be a few games or one game. For a given team and year all period rows represent exclusive ranges of time.

请注意,周期表可以指代一个可变的时间范围——可以是整个赛季,也可以是几场比赛或一场比赛。对于给定的团队和年份,所有期间行都代表专有的时间范围。

I've got a query written which joins up tables and uses a GROUP BY periods.year to aggregate scores for a season (see sqlfiddle). However, if a coach had two positions in the same year the GROUP BY will count the same period row twice. How can I ditch the duplicates when a coach held two positions but still sum up periods when a year is comprised of multiple periods? If there's a better way to do the schema I'd also appreciate it if you pointed it out to me.

我编写了一个查询,它连接表并使用 GROUP BY period.year 来汇总一个季节的分数(请参阅 sqlfiddle)。但是,如果教练在同一年有两个职位,则 GROUP BY 会将同一时期的行计算两次。当一名教练担任两个职位但仍然总结一年由多个时期组成的时期时,我该如何摆脱重复?如果有更好的方法来执行架构,如果您向我指出它,我也会很感激。

回答by Erwin Brandstetter

The underlying problem(join to multiple tables with multiple matches) is explained in this closely related answer:

潜在的问题(加入与多个匹配多个表)在此密切相关的答案解释:

To fix, I first simplifiedyour query:

为了解决这个问题,我首先简化了您的查询:

select pe.year
     , sum(pe.wins)       AS wins
     , sum(pe.losses)     AS losses
     , sum(pe.ties)       AS ties
     , array_agg(po.id)   AS position_id
     , array_agg(po.name) AS position_names
from   periods_positions_coaches_linking pp
join   positions po ON po.id = pp.position
join   periods   pe ON pe.id = pp.period
where  pp.coach = 1
group  by pe.year
order  by pe.year;

Yields the same, incorrectresult as your original, but simpler / faster / easier to read.

产生与原始结果相同但不正确的结果,但更简单/更快/更易于阅读。

  • No point in joining the table coachas long as you don't use columns in the SELECTlist. I removed it completely and replaced the WHEREcondition with where pp.coach = 1.

  • You don't need COALESCE. NULLvalues are ignored in the aggregate function sum(). No need to substitute 0.

  • Use table aliases to make it easier to read.

  • coach只要您不使用SELECT列表中的列,就没有必要加入表格。我完全删除了它并WHEREwhere pp.coach = 1.

  • 你不需要COALESCE. NULL值在聚合函数中被忽略sum()。无需替换0

  • 使用表别名使其更易于阅读。

Next, I solvedyour problem like this:

接下来,我像这样解决了你的问题:

SELECT *
FROM  (
  SELECT pe.year
       , array_agg(DISTINCT po.id)   AS position_id
       , array_agg(DISTINCT po.name) AS position_names
  FROM   periods_positions_coaches_linking pp
  JOIN   positions                         po ON po.id = pp.position
  JOIN   periods                           pe ON pe.id = pp.period
  WHERE  pp.coach = 1
  GROUP  BY pe.year
  ) po
LEFT JOIN (
  SELECT pe.year
       , sum(pe.wins)   AS wins
       , sum(pe.losses) AS losses
       , sum(pe.ties)   AS ties
  FROM  (
     SELECT period
     FROM   periods_positions_coaches_linking
     WHERE  coach = 1
     GROUP  BY period
     ) pp
  JOIN   periods pe ON pe.id = pp.period
  GROUP  BY pe.year
  ) pe USING (year)
ORDER  BY year;
  • Aggregate positions and periods separately before joining them.

  • In the first sub-querylist positions only once by simply using DISTINCT.

  • In the second sub-query

    • GROUP BY period, because a coach can have multiple positions per period.
    • JOINto periods-data afterthat, and then aggregate to get sums.
  • 在加入之前分别汇总头寸和期间。

  • 第一个子查询列表中,只需使用DISTINCT.

  • 第二个子查询中

    • GROUP BY period,因为教练每个时期可以有多个职位。
    • JOIN以期数据说,然后汇总得到的款项。

SQL Fiddle.

SQL小提琴。

回答by Teena Thomas

use distinctas shown here

使用distinct这里

code:

代码:

select periods.year as year,
sum(coalesce(periods.wins, 0)) as wins,
sum(coalesce(periods.losses, 0)) as losses,
sum(coalesce(periods.ties, 0)) as ties,
array_agg( distinct positions.id) as position_id,
array_agg( distinct positions.name) as position_names

from periods_positions_coaches_linking

join coaches on coaches.id = periods_positions_coaches_linking.coach
join positions on positions.id = periods_positions_coaches_linking.position
join periods on periods.id = periods_positions_coaches_linking.period

where coaches.id = 1

group by periods.year, positions.id
order by periods.year;

回答by Gordon Linoff

In your case, the easiest way might be to divide out the positions:

在您的情况下,最简单的方法可能是划分职位:

select periods.year as year,
       sum(coalesce(periods.wins, 0))/COUNT(distinct positions.id) as wins,
       sum(coalesce(periods.losses, 0))/COUNT(distinct positions.id) as losses,
       sum(coalesce(periods.ties, 0))/COUNT(distinct positions.id) as ties,
       array_agg(distinct positions.id) as position_id,
       array_agg(distinct positions.name) as position_names
from periods_positions_coaches_linking join
     coaches
     on coaches.id = periods_positions_coaches_linking.coach join
     positions
     on positions.id = periods_positions_coaches_linking.position join
     periods
     on periods.id = periods_positions_coaches_linking.period
where coaches.id = 1
group by periods.year
order by periods.year;

The number of positions scales the wins, losses, and ties, so dividing it out adjusts the counts.

头寸数量可以衡量胜负和平局,因此将其除以调整计数。