如何更快地执行 SQL 'NOT IN' 查询?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9230878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I perform a SQL 'NOT IN' query faster?
提问by Howiecamp
I have a table (EMAIL) of email addresses:
我有一个电子邮件地址表(EMAIL):
EmailAddress
------------
[email protected]
[email protected]
[email protected]
[email protected]
and a table (BLACKLIST) of blacklisted email addresses:
和列入黑名单的电子邮件地址表 (BLACKLIST):
EmailAddress
------------
[email protected]
[email protected]
and I want to select those email addresses that are in the EMAIL table but NOT in the BLACKLIST table. I'm doing:
我想选择那些在 EMAIL 表中但不在 BLACKLIST 表中的电子邮件地址。我正在做:
SELECT EmailAddress
FROM EMAIL
WHERE EmailAddress NOT IN
(
SELECT EmailAddress
FROM BLACKLIST
)
but when the row counts get very high the performance is terrible.
但是当行数变得非常高时,性能会很糟糕。
How can I better do this? (Assume generic SQL if possible. If not, assume T-SQL.)
我怎样才能更好地做到这一点? (如果可能,假设通用 SQL。如果不是,假设 T-SQL。)
回答by Pablo Santa Cruz
You can use a left outer join, or a not exists
clause.
您可以使用左外连接或not exists
子句。
Left outer join:
左外连接:
select E.EmailAddress
from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
where B.EmailAddress is null;
Not Exists:
不存在:
select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)
Bothare quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in
one.
两者都是非常通用的 SQL 解决方案(不依赖于特定的数据库引擎)。我会说后者的性能更高一点(虽然不是很多)。但绝对比那个性能更好not in
。
As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress)
, that should help speed up the execution of your query.
正如评论者所说,您还可以尝试在 上创建索引BLACKLIST(EmailAddress)
,这应该有助于加快查询的执行速度。
回答by Daniel Gustafsson
NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.
如果黑名单允许空值作为 EmailAddress,则 NOT IN 与 NOT EXISTS 不同。如果存在单个空值,则查询结果将始终返回零行,因为 NOT IN (null) 对于每个值都是未知的/假的。因此,查询计划略有不同,但我认为不会有任何严重的性能影响。
A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.
一个建议是创建一个名为 VALIDEMAIL 的新表,向黑名单添加一个触发器,在插入行时从 VALIDEMAIL 中删除地址,并在从黑名单中删除时添加到 VALIDEMAIL。然后将 EMAIL 替换为 VALIDEMAIL 和 BLACKLIST 的联合视图。
回答by Burton Leed
select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)
Equals (BTW there is probably an owner)
等于(顺便说一句,可能有一个所有者)
select EmailAddress from mail.EMAIL
EXCEPT
select EmailAddress from mail.BLACKLIST
will give you the rows that are different even if NULL in an EmailAddress
即使在 EmailAddress 中为 NULL,也会为您提供不同的行