postgresql 左连接和 count() 缺失行所需的解释
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6234775/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Explanation needed for missing rows with left join and count()
提问by DeadMonkey
Can someone please help me understand the following behavior that occurs when I add a WHERE clause to a query that has a LEFT JOIN with COUNT(*)?
有人可以帮助我理解当我将 WHERE 子句添加到具有 COUNT(*) 的 LEFT JOIN 的查询时发生的以下行为吗?
I have two tables:
我有两个表:
TABLE 1: customers
customer_id | name
------------------
1 | Bob
2 | James
3 | Fred
TABLE 2: orders
order_id | customer_id | order_timestamp
----------------------------------------
1000 | 1 | 2011-01-01 00:00
1001 | 1 | 2011-01-05 00:00
1002 | 2 | 2011-01-10 00:00
Now the following query tells me how many orders each customer placed:
现在下面的查询告诉我每个客户下了多少订单:
select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
group by 1
customer_id | count
-------------------
1 | 2
2 | 1
3 | 0
This works great BUT if I add a WHERE clause to the query, the query no longer outputs counts of zeroes for customers who did not place any orders even though I'm doing a LEFT JOIN:
这很好用,但是如果我向查询添加 WHERE 子句,即使我正在执行 LEFT JOIN,查询也不再为没有下任何订单的客户输出零计数:
select c.customer_id, count(o.order_id)
from customers c
left join orders o using (customer_id)
where o.order_timestamp >= '2011-01-05'
group by 1
customer_id | count
-------------------
1 | 1
2 | 1
Now if I move the WHERE condition as part of the LEFT JOIN like the following, I get back my zero counts for customers who did not place orders:
现在,如果我将 WHERE 条件作为 LEFT JOIN 的一部分移动,如下所示,我会为没有下订单的客户取回零计数:
select c.customer_id, count(o.order_id)
from customers c
left join orders o on (c.customer_id = o.customer_id) and (o.order_timestamp >= '2011-01-05')
group by 1
I'm confused at why the second query does not work, but the third one does? Can someone please provide me with an explanation? Also not sure if this matters, but I'm using postgres. Thanks!
我很困惑为什么第二个查询不起作用,但第三个查询起作用?有人可以给我一个解释吗?也不确定这是否重要,但我正在使用 postgres。谢谢!
采纳答案by Chris Shaffer
This is because NULL is not greater than or equal to anything; If you change your WHERE clause to where o.order_timestamp is null or o.order_timestamp >= '2011-01-05'
then you will get the same behavior as your join clause limit.
这是因为 NULL 不大于或等于任何东西;如果您将 WHERE 子句更改为,where o.order_timestamp is null or o.order_timestamp >= '2011-01-05'
那么您将获得与连接子句限制相同的行为。
Note though - I would recommend the join clause approach, as it matches more closely what you are trying to do. Also the change to the WHERE clause I mentioned above will only work if the order_timestamp column is not nullable -- if it is then you should use a different column for the null check (eg, where o.primarykey is null or o.order_timestamp >= '2011-01-05'
).
但请注意 - 我会推荐 join 子句方法,因为它更接近您想要做的事情。此外,我上面提到的对 WHERE 子句的更改仅在 order_timestamp 列不可为空时才有效——如果是,那么您应该使用不同的列进行空检查(例如,where o.primarykey is null or o.order_timestamp >= '2011-01-05'
)。
回答by OMG Ponies
Placement of filter criteria matters when dealing with OUTER joins (RIGHT, LEFT). Criteria in the ON clause of an OUTER JOIN is applied before the JOIN; criteria in the WHERE clause is applied afterthe JOIN -- applied against the resultset that uses the JOIN.
在处理外部连接(右、左)时,过滤条件的放置很重要。在 JOIN 之前应用 OUTER JOIN 的 ON 子句中的条件;WHERE 子句中的标准在JOIN之后应用——应用于使用 JOIN 的结果集。
SELECT c.customer_id,
COUNT(o.order_id)
FROM CUSTOMERS c
LEFT JOIN ORDERS o ON o.customer_id - c.customer_id
AND o.order_timestamp >= '2011-01-05'
GROUP BY c.customer_id
Ordinals
序数
Ordinals, meaning using a numeric value that refers to the numeric position of the columns in the SELECT clause, is not a recommended practice. If anyone changes the query -- say to add a column -- it could drastically affect your query.
序数,即使用引用 SELECT 子句中列的数字位置的数值,不是推荐的做法。如果有人更改了查询——比如添加一列——它可能会极大地影响你的查询。
回答by tejash
Chirs is right, null is not greater than or equal to anything. So when you include your condition in where clause it applies on final view(table) of result generated by left join, in this result your condition removes row which has time stamp null.
Chirs 是对的,null 不大于或等于任何东西。因此,当您将条件包含在 where 子句中时,它适用于左连接生成的结果的最终视图(表),在此结果中,您的条件会删除时间戳为 null 的行。
However when you apply same condition during doing a join, condition applies only on order table and than left join performed. So it does not remove rows which has time stamp null.
但是,当您在连接期间应用相同的条件时,条件仅适用于订单表,而不是执行的左连接。所以它不会删除时间戳为空的行。
So, in third query condition applied before final table generated and in second query condition applied after final table generated
因此,在生成最终表之前应用的第三个查询条件和在生成最终表之后应用的第二个查询条件中