postgresql 防止 GROUP BY 中的行重复计算

Question

提问by ldrg

Here's the basic guts of my schema and problem: http://sqlfiddle.com/#!1/72ec9/4/0

这是我的架构和问题的基本内容：http: //sqlfiddle.com/#!1/72ec9/4/0

Note that the periods table can refer to a variable range of time - it could be an entire season, it could be a few games or one game. For a given team and year all period rows represent exclusive ranges of time.

请注意，周期表可以指代一个可变的时间范围——可以是整个赛季，也可以是几场比赛或一场比赛。对于给定的团队和年份，所有期间行都代表专有的时间范围。

I've got a query written which joins up tables and uses a GROUP BY periods.year to aggregate scores for a season (see sqlfiddle). However, if a coach had two positions in the same year the GROUP BY will count the same period row twice. How can I ditch the duplicates when a coach held two positions but still sum up periods when a year is comprised of multiple periods? If there's a better way to do the schema I'd also appreciate it if you pointed it out to me.

我编写了一个查询，它连接表并使用 GROUP BY period.year 来汇总一个季节的分数（请参阅 sqlfiddle）。但是，如果教练在同一年有两个职位，则 GROUP BY 会将同一时期的行计算两次。当一名教练担任两个职位但仍然总结一年由多个时期组成的时期时，我该如何摆脱重复？如果有更好的方法来执行架构，如果您向我指出它，我也会很感激。

Answer 1

回答by Erwin Brandstetter

The underlying problem(join to multiple tables with multiple matches) is explained in this closely related answer:

将潜在的问题（加入与多个匹配多个表）在此密切相关的答案解释：

Two SQL LEFT JOINS produce incorrect result

两个 SQL LEFT JOINS 产生不正确的结果

To fix, I first simplifiedyour query:

为了解决这个问题，我首先简化了您的查询：

select pe.year
     , sum(pe.wins)       AS wins
     , sum(pe.losses)     AS losses
     , sum(pe.ties)       AS ties
     , array_agg(po.id)   AS position_id
     , array_agg(po.name) AS position_names
from   periods_positions_coaches_linking pp
join   positions po ON po.id = pp.position
join   periods   pe ON pe.id = pp.period
where  pp.coach = 1
group  by pe.year
order  by pe.year;

Yields the same, incorrectresult as your original, but simpler / faster / easier to read.

产生与原始结果相同但不正确的结果，但更简单/更快/更易于阅读。

No point in joining the table coachas long as you don't use columns in the SELECTlist. I removed it completely and replaced the WHEREcondition with where pp.coach = 1.
You don't need COALESCE. NULLvalues are ignored in the aggregate function sum(). No need to substitute 0.
Use table aliases to make it easier to read.

coach只要您不使用SELECT列表中的列，就没有必要加入表格。我完全删除了它并WHERE用where pp.coach = 1.
你不需要COALESCE. NULL值在聚合函数中被忽略sum()。无需替换0。
使用表别名使其更易于阅读。

Next, I solvedyour problem like this:

接下来，我像这样解决了你的问题：

SELECT *
FROM  (
  SELECT pe.year
       , array_agg(DISTINCT po.id)   AS position_id
       , array_agg(DISTINCT po.name) AS position_names
  FROM   periods_positions_coaches_linking pp
  JOIN   positions                         po ON po.id = pp.position
  JOIN   periods                           pe ON pe.id = pp.period
  WHERE  pp.coach = 1
  GROUP  BY pe.year
  ) po
LEFT JOIN (
  SELECT pe.year
       , sum(pe.wins)   AS wins
       , sum(pe.losses) AS losses
       , sum(pe.ties)   AS ties
  FROM  (
     SELECT period
     FROM   periods_positions_coaches_linking
     WHERE  coach = 1
     GROUP  BY period
     ) pp
  JOIN   periods pe ON pe.id = pp.period
  GROUP  BY pe.year
  ) pe USING (year)
ORDER  BY year;

Aggregate positions and periods separately before joining them.
In the first sub-querylist positions only once by simply using DISTINCT.
In the second sub-query
- GROUP BY period, because a coach can have multiple positions per period.
- JOINto periods-data afterthat, and then aggregate to get sums.

在加入之前分别汇总头寸和期间。
在第一个子查询列表中，只需使用DISTINCT.
在第二个子查询中
- GROUP BY period，因为教练每个时期可以有多个职位。
- JOIN以期数据后说，然后汇总得到的款项。

SQL Fiddle.

SQL小提琴。

Answer 2

回答by Teena Thomas

use distinctas shown here

使用distinct如这里

code:

代码：

select periods.year as year,
sum(coalesce(periods.wins, 0)) as wins,
sum(coalesce(periods.losses, 0)) as losses,
sum(coalesce(periods.ties, 0)) as ties,
array_agg( distinct positions.id) as position_id,
array_agg( distinct positions.name) as position_names

from periods_positions_coaches_linking

join coaches on coaches.id = periods_positions_coaches_linking.coach
join positions on positions.id = periods_positions_coaches_linking.position
join periods on periods.id = periods_positions_coaches_linking.period

where coaches.id = 1

group by periods.year, positions.id
order by periods.year;

Answer 3

回答by Gordon Linoff

In your case, the easiest way might be to divide out the positions:

在您的情况下，最简单的方法可能是划分职位：

select periods.year as year,
       sum(coalesce(periods.wins, 0))/COUNT(distinct positions.id) as wins,
       sum(coalesce(periods.losses, 0))/COUNT(distinct positions.id) as losses,
       sum(coalesce(periods.ties, 0))/COUNT(distinct positions.id) as ties,
       array_agg(distinct positions.id) as position_id,
       array_agg(distinct positions.name) as position_names
from periods_positions_coaches_linking join
     coaches
     on coaches.id = periods_positions_coaches_linking.coach join
     positions
     on positions.id = periods_positions_coaches_linking.position join
     periods
     on periods.id = periods_positions_coaches_linking.period
where coaches.id = 1
group by periods.year
order by periods.year;

The number of positions scales the wins, losses, and ties, so dividing it out adjusts the counts.

头寸数量可以衡量胜负和平局，因此将其除以调整计数。

postgresql 防止 GROUP BY 中的行重复计算

提问by ldrg

回答by Erwin Brandstetter

回答by Teena Thomas

回答by Gordon Linoff

相关推荐

最近更新

标签

postgresql 防止 GROUP BY 中的行重复计算

提问by ldrg

回答by Erwin Brandstetter

回答by Teena Thomas

回答by Gordon Linoff

相关推荐

postgresql Postgres 中的连接行

postgresql 抑制“重复键值违反唯一约束”错误

在 Postgresql 中计算列类型的大小

如何在 Django 中使用 PostgreSQL 9.2 JSON 数据类型？

相关推荐

最近更新

标签