MYSQL sum() 用于不同的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2436284/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 15:30:33  来源:igfitidea点击:

MYSQL sum() for distinct rows

mysqlsum

提问by makeee

I'm looking for help using sum() in my SQL query:

我正在寻找在我的 SQL 查询中使用 sum() 的帮助:

SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions, 
       sum(conversions.value) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

I use DISTINCTbecause I'm doing "group by" and this ensures the same row is not counted more than once.

我使用DISTINCT是因为我在做“分组依据”,这确保同一行不会被计算多次。

The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)

问题是 SUM(conversions.value) 不止一次计算每一行的“值”(由于分组)

I basically want to do SUM(conversions.value)for each DISTINCT conversions.id.

我基本上想SUM(conversions.value)为每个 DISTINCT conversions.id做。

Is that possible?

那可能吗?

回答by Jerome WAGNER

I may be wrong but from what I understand

我可能错了,但据我所知

  • conversions.idis the primary keyof your table conversions
  • stats.idis the primary keyof your table stats
  • Conversions.id是表转换主键
  • stats.id是表统计信息主键

Thus for each conversions.id you have at most one links.id impacted.

因此,对于每个 Conversions.id,您最多会影响一个 links.id。

You request is a bit like doing the cartesian product of 2 sets :

你的要求有点像做 2 套笛卡尔积:

[clicks]
SELECT *
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 

[conversions]
SELECT *
FROM links 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 

and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines

对于每个链接,您会得到 sizeof([clicks]) x sizeof([conversions]) 行

As you noted the number of unique conversions in your request can be obtained via a

正如您所指出的,您的请求中的唯一转化次数可以通过

count(distinct conversions.id) = sizeof([conversions])

this distinct manages to remove all the [clicks] lines in the cartesian product

这个独特的设法删除了笛卡尔积中的所有 [clicks] 行

but clearly

但很明显

sum(conversions.value) = sum([conversions].value) * sizeof([clicks])

In your case, since

在你的情况下,因为

count(*) = sizeof([clicks]) x sizeof([conversions])
count(*) = sizeof([clicks]) x count(distinct conversions.id)

you have

你有

sizeof([clicks]) = count(*)/count(distinct conversions.id)

so I would test your request with

所以我会测试你的请求

SELECT links.id, 
   count(DISTINCT stats.id) as clicks, 
   count(DISTINCT conversions.id) as conversions, 
   sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

Keep me posted ! Jerome

随时关注我!杰罗姆

回答by Clemens Valiente

Jeromes solution is actually wrong and can produce incorrect results!!

杰罗姆斯的解决方案实际上是错误的,可能会产生不正确的结果!!

sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value

let's assume the following table

让我们假设下表

conversions
id value
1 5
1 5
1 5
2 2
3 1

the correct sum of value for distinct ids would be 8. Jerome's formula produces:

不同 id 的正确值总和为 8。 Jerome 的公式产生:

sum(conversions.value) = 18
count(distinct conversions.id) = 3
count(*) = 5
18*3/5 = 9.6 != 8

回答by TehShrike

For an explanation of why you were seeing incorrect numbers, read this.

有关您看到错误数字的原因的解释,请阅读此内容

I think that Jerome has a handle on what is causing your error. Bryson's query would work, though having that subquery in the SELECT could be inefficient.

我认为 Jerome 可以处理导致您错误的原因。Bryson 的查询可以工作,但在 SELECT 中使用该子查询可能效率低下。

回答by Bryson

Use the following query:

使用以下查询:

SELECT links.id
  , (
    SELECT COUNT(*)
    FROM stats
    WHERE links.id = stats.parent_id
  ) AS clicks
  , conversions.conversions
  , conversions.conversion_value
FROM links
LEFT JOIN (
  SELECT link_id
    , COUNT(id) AS conversions
    , SUM(conversions.value) AS conversion_value
  FROM conversions
  GROUP BY link_id
) AS conversions ON links.id = conversions.link_id
ORDER BY links.created DESC

回答by Dave

I use a subquery to do this. It eliminates the problems with grouping. So the query would be something like:

我使用子查询来做到这一点。它消除了分组问题。所以查询将类似于:

SELECT COUNT(DISTINCT conversions.id)
...
     (SELECT SUM(conversions.value) FROM ....) AS Vals

回答by Quesi

How about something like this:

这样的事情怎么样:

select l.id, count(s.id) clicks, count(c.id) clicks, sum(c.value) conversion_value
from    (SELECT l.id id, l.created created,
               s.id clicks,  
               c.id conversions,  
               max(c.value) conversion_value                    
        FROM links l LEFT
        JOIN stats s ON l.id = s.parent_id LEFT
        JOIN conversions c ON l.id = c.link_id  
        GROUP BY l.id, l.created, s.id, c.id) t
order by t.created  

回答by Dipu Raj

This will do the trick, just divide the sum with the count of conversation id which are duplicate.

这将起作用,只需将总和除以重复的对话 id 的计数即可。

SELECT a.id,
       a.clicks,
       SUM(a.conversion_value/a.conversions) AS conversion_value,
       a.conversions
FROM (SELECT links.id, 
       COUNT(DISTINCT stats.id) AS clicks, 
       COUNT(conversions.id) AS conversions, 
       SUM(conversions.value) AS conversion_value 
      FROM links 
      LEFT OUTER JOIN stats ON links.id = stats.parent_id 
      LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
      GROUP BY conversions.id,links.id
      ORDER BY links.created DESC) AS a
GROUP BY a.id