MYSQL“IN”的性能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1537675/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Performance of MYSQL "IN"
提问by Roy
I'm running a MYSQL query in two steps. First, I get a list of ids with one query, and then I retrieve the data for those ids using a second query along the lines of SELECT * FROM data WHERE id in (id1, id2 ...)
. I know it sounds hacky, but I've done it this way as the queries are very complicated; the first involves lots of geometry and triggernometry, the second one lots of different joins. I'm sure they could be written in a single query, but my MYSQL isn't good enough to pull it off.
我分两步运行 MYSQL 查询。首先,我通过一个查询获得一个 id 列表,然后我使用第二个查询检索这些 id 的数据SELECT * FROM data WHERE id in (id1, id2 ...)
。我知道这听起来很hacky,但我已经这样做了,因为查询非常复杂;第一个涉及许多几何和触发器测量,第二个涉及许多不同的连接。我确信它们可以写在一个查询中,但我的 MYSQL 不够好,无法实现它。
This approach works, but it doesn't feelright; plus I'm concerned it won't scale. At the moment I am testing on a database of 10,000 records, with 400 ids in the "IN" clause ( i.e. IN (id1, id2 ... id400)
) and performance is fine. But what if there are say 1,000,000 records?
这种方法有效,但感觉不对;另外我担心它不会扩展。目前,我正在测试一个包含 10,000 条记录的数据库,在“IN”子句(即IN (id1, id2 ... id400)
)中有 400 个 ID,并且性能很好。但是如果有 1,000,000 条记录呢?
Where are the performance bottlenecks (speed, memory, etc) for this kind of query? Any ideas for how to refactor this kind of query for be awesome too. (for example, if it is worth swotting up on stored procedures).
这种查询的性能瓶颈(速度、内存等)在哪里?关于如何重构这种查询的任何想法也很棒。(例如,是否值得关注存储过程)。
采纳答案by Quassnoi
Starting from a certain number of records, the IN
predicate over a SELECT
becomes faster than that over a list of constants.
从一定数量的记录开始,a 上的IN
谓词SELECT
变得比常量列表上的谓词更快。
See this article in my blog for performance comparison:
有关性能比较,请参阅我博客中的这篇文章:
If the column used in the query in the IN
clause is indexed, like this:
如果IN
子句中查询中使用的列被索引,像这样:
SELECT *
FROM table1
WHERE unindexed_column IN
(
SELECT indexed_column
FROM table2
)
, then this query is just optimized to an EXISTS
(which uses but a one entry for each record from table1
)
,那么这个查询只是优化为一个EXISTS
(它只为来自 的每条记录使用一个条目table1
)
Unfortunately, MySQL
is not capable of doing HASH SEMI JOIN
or MERGE SEMI JOIN
which are yet more efficient (especially if both columns are indexed).
不幸的是,MySQL
不能这样做HASH SEMI JOIN
或者MERGE SEMI JOIN
效率更高(特别是如果两列都被索引)。
回答by Eric Hogue
Why do you extract the ids first? You should probably just join the tables. If you use the ids for something else, you can insert them in a temp table before and use this table for the join.
为什么要先提取id?您可能应该加入表格。如果您将 id 用于其他用途,您可以先将它们插入临时表中,然后使用该表进行连接。