MySQL 解释:是什么导致“使用临时;使用文件排序'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5859039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL Explain: what's causing 'Using temporary; Using filesort'
提问by bradvido
I'm planning on creating a view using this SQL SELECT, but the explain for it shows it's using temporary and using filesort. I can't figure out what indices I need in order to fix this problem. Mostly, I'm wondering why it's using filesort intead of using an index to sort.
我打算使用这个 SQL SELECT 创建一个视图,但它的解释显示它使用临时和使用文件排序。我无法弄清楚我需要什么索引来解决这个问题。大多数情况下,我想知道为什么它使用文件排序而不是使用索引进行排序。
Here are my tables:
这是我的表:
CREATE TABLE `learning_signatures` (
`signature_id` int(11) NOT NULL AUTO_INCREMENT,
`signature_file` varchar(100) NOT NULL,
`signature_md5` varchar(32) NOT NULL,
`image_file` varchar(100) NOT NULL,
PRIMARY KEY (`signature_id`),
UNIQUE KEY `unique_signature_md5` (`signature_md5`)
) ENGINE=InnoDB AUTO_INCREMENT=640 DEFAULT CHARSET=latin1
CREATE TABLE `learning_user_suggestions` (
`user_suggestion_id` int(11) NOT NULL AUTO_INCREMENT,
`signature_id` int(11) NOT NULL,
`ch` char(1) NOT NULL,
`time_suggested` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`user_id` int(11) NOT NULL,
PRIMARY KEY (`user_suggestion_id`),
KEY `char_index` (`ch`),
KEY `ls_sig_id_indx` (`signature_id`),
KEY `user_id_indx` (`user_id`),
KEY `sig_char_indx` (`signature_id`,`ch`)
) ENGINE=InnoDB AUTO_INCREMENT=1173 DEFAULT CHARSET=latin1
And here is the problematic SQL statement I'm planning on using in my view:
这是我计划使用的有问题的 SQL 语句:
select ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file, sug.ch , count(sug.ch) AS suggestion_count
from (`learning_signatures` `ls` left join `learning_user_suggestions` `sug` on(ls.signature_id = sug.signature_id))
group by ls.signature_id, sug.ch;
Output from explain:
解释的输出:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ls ALL NULL NULL NULL NULL 514 "Using temporary; Using filesort"
1 SIMPLE sug ref ls_sig_id_indx,sig_char_indx ls_sig_id_indx 4 wwf.ls.signature_id 1
Another example, this time using a where clause:
另一个例子,这次使用 where 子句:
explain select ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file, sug.ch , count(sug.ch) AS suggestion_count
from (`learning_signatures` `ls` left join `learning_user_suggestions` `sug` on(ls.signature_id = sug.signature_id))
WHERE signature_md5 = '75f8a5b1176ecc2487b90bacad9bc4c'
group by ls.signature_id, sug.ch;
Explain output:
解释输出:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ls const unique_signature_md5 unique_signature_md5 34 const 1 "Using temporary; Using filesort"
1 SIMPLE sug ref ls_sig_id_indx,sig_char_indx ls_sig_id_indx 4 const 1
回答by piotrm
In your first query, what you do is join your signatures table with user suggestions, getting lots of rows, and then group results using some columns from user suggestions. But there is no index for the joined table to help with grouping as it would have to be defined on previously joined table. What you should instead do is try to create a derived table from user suggestions that is already groupped by ch and signature_id and then join it:
在您的第一个查询中,您所做的是将您的签名表与用户建议连接起来,获取大量行,然后使用用户建议中的一些列对结果进行分组。但是连接表没有索引来帮助分组,因为它必须在先前连接的表上定义。你应该做的是尝试从已经按 ch 和 signature_id 分组的用户建议创建一个派生表,然后加入它:
SELECT ls.signature_id, ls.signature_file, ls.signature_md5, ls.image_file,
sug.ch, sug.suggestion_count
FROM learning_signatures ls
LEFT JOIN
(SELECT s.signature_id, s.ch, count(s.ch) as suggestion_count
FROM learning_user_suggestions s
GROUP BY s.signature_id, s.ch ) as sug
ON ls.signature_id = sug.signature_id
Optimizer should be able now to use your sig_char_indx index for groupping, the derived table will be not bigger than your signatures table and you join both using unique column. You will still have to do a full scan over signatures table, but that cannot be avoided because you are selecting all of it anyway.
优化器现在应该能够使用您的 sig_char_indx 索引进行分组,派生表不会大于您的签名表,并且您使用唯一列加入两者。您仍然需要对签名表进行全面扫描,但这是无法避免的,因为无论如何您都会选择所有这些。
As for the second query, if you want to restrict signatures to a single one just append
至于第二个查询,如果您想将签名限制为单个,只需追加
WHERE ls.signature_md5='75f8a5b1176ecc2487b90bacad9bc4c'
to the end of previous query and group by only s.ch, because only one signature_id will match your md5 anyway. Optimizer should now use md5 index for where and char_index for grouping.
到上一个查询的末尾并仅按 s.ch 分组,因为无论如何只有一个 signature_id 会与您的 md5 匹配。优化器现在应该使用 md5 索引作为 where 和 char_index 进行分组。
回答by Damp
Maybe it'll help if you create an index on learning_signatures that contains both signature_md5 and signature_id (and in that order)
如果您在 learning_signatures 上创建一个包含 signature_md5 和 signature_id(并按该顺序)的索引,也许会有所帮助
`KEY `md5_id` (`signature_md5`,`signature_id`)?
I'm no MySQL expert but I found that created keys that encapsulate both the where clause and the join clause usually helps get rid of temporary and filesort
我不是 MySQL 专家,但我发现创建包含 where 子句和 join 子句的键通常有助于摆脱临时和文件排序
回答by sjas
Use indexes. Find the fields where you need them by using EXPLAIN
on your queries.
使用索引。通过EXPLAIN
在您的查询中使用找到您需要它们的字段。
If you however have a mostly write-only DB (few reads), you may want to refrain from using indexes, as they may have a negative impact on your write performance.
但是,如果您有一个主要是只写的数据库(很少读取),您可能希望避免使用索引,因为它们可能会对您的写入性能产生负面影响。