MySQL SQL:“NOT IN”子查询优化或替代方案
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3645794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL: "NOT IN" subquery optimization or alternatives
提问by krn
I have two database tables: "places" and "translations". The translations of places names are made by selecting records from "places", which don't have the translations to the specified language yet:
我有两个数据库表:“地点”和“翻译”。地名的翻译是通过从“地方”中选择记录来进行的,这些记录还没有指定语言的翻译:
SELECT `id`, `name`
FROM `places`
WHERE `id` NOT IN (SELECT `place_id` FROM `translations` WHERE `lang` = 'en')
This worked fine with 7 000 records of places, but crashed when the number of translations reached 5 000. Since then, the query takes about 10 seconds and returns the error:
这对 7 000 条地点记录运行良好,但在翻译数量达到 5 000 时崩溃。从那时起,查询大约需要 10 秒并返回错误:
2006 - MySQL server has gone away
2006 - MySQL 服务器消失了
As I understand, the main problem here is the subquery returning to many results, bu how could I solve it, if I need to select all the places which are not translated yet?
据我了解,这里的主要问题是子查询返回许多结果,但是如果我需要选择所有尚未翻译的地方,我该如何解决?
My plan B is to create a new boolean field in "places" table, called "translated", and reset it to "false", each time I change language - that would prevent for having subquery. However, maybe I could just modify my current SQL statement and prevent from adding additional field?
我的计划 B 是在“places”表中创建一个新的布尔字段,称为“translated”,并在每次更改语言时将其重置为“false”——这将阻止子查询。但是,也许我可以修改我当前的 SQL 语句并防止添加其他字段?
回答by Tomalak
The obvious alternative:
明显的替代方案:
SELECT
`id`, `name`
FROM
`places`
WHERE
NOT EXISTS (
SELECT 1 FROM `translations` WHERE `id` = `places`.`id` AND `lang` = 'en'
)
There should be a clustered composite index over (translations.id, translations.lang)
(composite means: a single index over multiple fields, clustered means: the index governs how the table is sorted).
应该有一个聚集的复合索引(translations.id, translations.lang)
(复合意味着:多个字段上的单个索引,聚集意味着:索引控制表的排序方式)。
回答by Diogo Alves
In this case, I believe that the best alternative is to do 2 separate queries. Storing the result of the first in a variable and using the second.
在这种情况下,我认为最好的选择是进行 2 个单独的查询。将第一个的结果存储在变量中并使用第二个。