MYSQL sum() 用于不同的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2436284/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MYSQL sum() for distinct rows
提问by makeee
I'm looking for help using sum() in my SQL query:
我正在寻找在我的 SQL 查询中使用 sum() 的帮助:
SELECT links.id,
count(DISTINCT stats.id) as clicks,
count(DISTINCT conversions.id) as conversions,
sum(conversions.value) as conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY links.id
ORDER BY links.created desc;
I use DISTINCT
because I'm doing "group by" and this ensures the same row is not counted more than once.
我使用DISTINCT
是因为我在做“分组依据”,这确保同一行不会被计算多次。
The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)
问题是 SUM(conversions.value) 不止一次计算每一行的“值”(由于分组)
I basically want to do SUM(conversions.value)
for each DISTINCT conversions.id.
我基本上想SUM(conversions.value)
为每个 DISTINCT conversions.id做。
Is that possible?
那可能吗?
回答by Jerome WAGNER
I may be wrong but from what I understand
我可能错了,但据我所知
- conversions.idis the primary keyof your table conversions
- stats.idis the primary keyof your table stats
- Conversions.id是表转换的主键
- stats.id是表统计信息的主键
Thus for each conversions.id you have at most one links.id impacted.
因此,对于每个 Conversions.id,您最多会影响一个 links.id。
You request is a bit like doing the cartesian product of 2 sets :
你的要求有点像做 2 套笛卡尔积:
[clicks]
SELECT *
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
[conversions]
SELECT *
FROM links
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines
对于每个链接,您会得到 sizeof([clicks]) x sizeof([conversions]) 行
As you noted the number of unique conversions in your request can be obtained via a
正如您所指出的,您的请求中的唯一转化次数可以通过
count(distinct conversions.id) = sizeof([conversions])
this distinct manages to remove all the [clicks] lines in the cartesian product
这个独特的设法删除了笛卡尔积中的所有 [clicks] 行
but clearly
但很明显
sum(conversions.value) = sum([conversions].value) * sizeof([clicks])
In your case, since
在你的情况下,因为
count(*) = sizeof([clicks]) x sizeof([conversions])
count(*) = sizeof([clicks]) x count(distinct conversions.id)
you have
你有
sizeof([clicks]) = count(*)/count(distinct conversions.id)
so I would test your request with
所以我会测试你的请求
SELECT links.id,
count(DISTINCT stats.id) as clicks,
count(DISTINCT conversions.id) as conversions,
sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY links.id
ORDER BY links.created desc;
Keep me posted ! Jerome
随时关注我!杰罗姆
回答by Clemens Valiente
Jeromes solution is actually wrong and can produce incorrect results!!
杰罗姆斯的解决方案实际上是错误的,可能会产生不正确的结果!!
sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value
let's assume the following table
让我们假设下表
conversions
id value
1 5
1 5
1 5
2 2
3 1
the correct sum of value for distinct ids would be 8. Jerome's formula produces:
不同 id 的正确值总和为 8。 Jerome 的公式产生:
sum(conversions.value) = 18
count(distinct conversions.id) = 3
count(*) = 5
18*3/5 = 9.6 != 8
回答by TehShrike
For an explanation of why you were seeing incorrect numbers, read this.
有关您看到错误数字的原因的解释,请阅读此内容。
I think that Jerome has a handle on what is causing your error. Bryson's query would work, though having that subquery in the SELECT could be inefficient.
我认为 Jerome 可以处理导致您错误的原因。Bryson 的查询可以工作,但在 SELECT 中使用该子查询可能效率低下。
回答by Bryson
Use the following query:
使用以下查询:
SELECT links.id
, (
SELECT COUNT(*)
FROM stats
WHERE links.id = stats.parent_id
) AS clicks
, conversions.conversions
, conversions.conversion_value
FROM links
LEFT JOIN (
SELECT link_id
, COUNT(id) AS conversions
, SUM(conversions.value) AS conversion_value
FROM conversions
GROUP BY link_id
) AS conversions ON links.id = conversions.link_id
ORDER BY links.created DESC
回答by Dave
I use a subquery to do this. It eliminates the problems with grouping. So the query would be something like:
我使用子查询来做到这一点。它消除了分组问题。所以查询将类似于:
SELECT COUNT(DISTINCT conversions.id)
...
(SELECT SUM(conversions.value) FROM ....) AS Vals
回答by Quesi
How about something like this:
这样的事情怎么样:
select l.id, count(s.id) clicks, count(c.id) clicks, sum(c.value) conversion_value
from (SELECT l.id id, l.created created,
s.id clicks,
c.id conversions,
max(c.value) conversion_value
FROM links l LEFT
JOIN stats s ON l.id = s.parent_id LEFT
JOIN conversions c ON l.id = c.link_id
GROUP BY l.id, l.created, s.id, c.id) t
order by t.created
回答by Dipu Raj
This will do the trick, just divide the sum with the count of conversation id which are duplicate.
这将起作用,只需将总和除以重复的对话 id 的计数即可。
SELECT a.id,
a.clicks,
SUM(a.conversion_value/a.conversions) AS conversion_value,
a.conversions
FROM (SELECT links.id,
COUNT(DISTINCT stats.id) AS clicks,
COUNT(conversions.id) AS conversions,
SUM(conversions.value) AS conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY conversions.id,links.id
ORDER BY links.created DESC) AS a
GROUP BY a.id