SQL 索引是否与“IN”子句一起使用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31500/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Do indexes work with "IN" clause
提问by lomaxx
If I have a query like:
如果我有这样的查询:
Select EmployeeId
From Employee
Where EmployeeTypeId IN (1,2,3)
and I have an index on the EmployeeTypeId
field, does SQL server still use that index?
并且我在该EmployeeTypeId
字段上有一个索引,SQL 服务器是否仍然使用该索引?
采纳答案by Kibbee
Yeah, that's right. If your Employee
table has 10,000 records, and only 5 records have EmployeeTypeId
in (1,2,3), then it will most likely use the index to fetch the records. However, if it finds that 9,000 records have the EmployeeTypeId
in (1,2,3), then it would most likely just do a table scan to get the corresponding EmployeeId
s, as it's faster just to run through the whole table than to go to each branch of the index tree and look at the records individually.
是啊,没错。如果您的Employee
表有 10,000 条记录,而EmployeeTypeId
(1,2,3) 中只有 5 条记录,那么它很可能会使用索引来获取记录。但是,如果它发现 9,000 条记录具有EmployeeTypeId
in (1,2,3),那么它很可能只进行表扫描以获取相应的EmployeeId
s,因为仅遍历整个表比遍历每个表更快索引树的分支并单独查看记录。
SQL Server does a lot of stuff to try and optimize how the queries run. However, sometimes it doesn't get the right answer. If you know that SQL Server isn't using the index, by looking at the execution plan in query analyzer, you can tell the query engine to use a specific index with the following change to your query.
SQL Server 做了很多事情来尝试优化查询的运行方式。但是,有时它不会得到正确的答案。如果您知道 SQL Server 没有使用索引,通过查看查询分析器中的执行计划,您可以告诉查询引擎使用特定索引,并对查询进行以下更改。
SELECT EmployeeId FROM Employee WITH (Index(Index_EmployeeTypeId )) WHERE EmployeeTypeId IN (1,2,3)
Assuming the index you have on the EmployeeTypeId
field is named Index_EmployeeTypeId
.
假设您在该EmployeeTypeId
字段上的索引名为Index_EmployeeTypeId
.
回答by Kibbee
Usually it would, unless the IN clause covers too much of the table, and then it will do a table scan. Best way to find out in your specific case would be to run it in the query analyzer, and check out the execution plan.
通常它会,除非 IN 子句覆盖了太多的表,然后它会做一个表扫描。在您的特定情况下找出的最佳方法是在查询分析器中运行它,并检查执行计划。
回答by Mike Woodhouse
Unless technology has improved in ways I can't imagine of late, the "IN" query shown will produce a result that's effectively the OR-ing of three result sets, one for each of the values in the "IN" list. The IN clause becomes an equality condition for each of the list and will use an index if appropriate. In the case of unique IDs and a large enough table then I'd expect the optimiser to use an index.
除非技术以我最近无法想象的方式改进,否则显示的“IN”查询将产生一个结果,该结果实际上是三个结果集的 OR 运算,每个结果集对应一个“IN”列表中的值。IN 子句成为每个列表的相等条件,并在适当时使用索引。在唯一 ID 和足够大的表的情况下,我希望优化器使用索引。
If the items in the list were to be non-unique however, and I guess in the example that a "TypeId" is a foreign key, then I'm more interested in the distribution. I'm wondering if the optimiser will check the stats for each value in the list? Say it checks the first value and finds it's in 20% of the rows (of a large enough table to matter). It'll probably table scan. But will the same query plan be used for the other two, even if they're unique?
但是,如果列表中的项目不是唯一的,并且我猜在示例中“TypeId”是外键,那么我对分布更感兴趣。我想知道优化器是否会检查列表中每个值的统计信息?假设它检查第一个值并发现它位于 20% 的行(足够大的表中)。它可能会进行表扫描。但是其他两个是否会使用相同的查询计划,即使它们是唯一的?
It's probably moot - something like an Employee table is likely to be small enough that it will stay cached in memory and you probably wouldn't notice a difference between that and indexed retrieval anyway.
这可能没有实际意义——像 Employee 表这样的东西可能足够小,以至于它会缓存在内存中,无论如何你可能不会注意到它和索引检索之间的区别。
And lastly, while I'm preaching, beware the query in the IN clause: it's often a quick way to get something working and (for me at least) can be a good way to express the requirement, but it's almost always better restated as a join. Your optimiser may be smart enough to spot this, but then again it may not. If you don't currently performance-check against production data volumes, do so - in these days of cost-based optimisation you can't be certain of the query plan until you have a full load and representative statistics. If you can't, then be prepared for surprises in production...
最后,当我在讲道时,请注意 IN 子句中的查询:它通常是使某些事情起作用的快速方法,并且(至少对我而言)可能是表达需求的好方法,但几乎总是更好地重新表述为一个加入。您的优化器可能足够聪明,可以发现这一点,但又可能不会。如果您当前没有针对生产数据量进行性能检查,请这样做 - 在基于成本的优化的这些日子里,您无法确定查询计划,直到您拥有完整的负载和代表性的统计数据。如果你不能,那么就准备好迎接生产中的惊喜吧……
回答by Dana the Sane
So there's the potential for an "IN" clause to run a table scan, but the optimizer will try and work out the best way to deal with it?
因此,“IN”子句有可能运行表扫描,但优化器会尝试找出处理它的最佳方法?
Whether an index is used doesn't so much vary on the type of query as much of the type and distribution of data in the table(s), how up-to-date your table statistics are, and the actual datatype of the column.
是否使用索引与查询类型的差异不大,而与表中数据的类型和分布、表统计信息的最新情况以及列的实际数据类型有很大不同.
The other posters are correct that an index will be used over a table scan if:
其他海报是正确的,如果出现以下情况,将在表扫描上使用索引:
- The query won't access more than a certain percent of the rows indexed (say ~10% but should vary between DBMS's).
- Alternatively, if there are a lot of rows, but relatively few unique values in the column, it also may be faster to do a table scan.
- 查询不会访问超过一定百分比的索引行(比如约 10%,但应该因 DBMS 的不同而不同)。
- 或者,如果有很多行,但列中的唯一值相对较少,则进行表扫描也可能更快。
The other variable that might not be that obvious is making sure that the datatypes of the values being compared are the same. In PostgreSQL, I don't think that indexes will be used if you're filtering on a float but your column is made up of ints. There are also some operators that don't support index use (again, in PostgreSQL, the ILIKE operator is like this).
另一个可能不那么明显的变量是确保被比较值的数据类型相同。在 PostgreSQL 中,如果您对浮点数进行过滤,但您的列由整数组成,我认为不会使用索引。还有一些操作符不支持索引的使用(再次强调,在PostgreSQL中,ILIKE操作符就是这样的)。
As noted though, always check the query analyser when in doubt and your DBMS's documentation is your friend.
如前所述,如有疑问,请始终检查查询分析器,您的 DBMS 文档是您的朋友。
回答by lomaxx
@Mike: Thanks for the detailed analysis. There are definately some interesting points you make there. The example I posted is somewhat trivial but the basis of the question came from using NHibernate.
@Mike:感谢详细分析。你在那里肯定有一些有趣的观点。我发布的示例有些微不足道,但问题的基础来自使用 NHibernate。
With NHibernate, you can write a clause like this:
使用 NHibernate,您可以编写这样的子句:
int[] employeeIds = new int[]{1, 5, 23463, 32523};
NHibernateSession.CreateCriteria(typeof(Employee))
.Add(Restrictions.InG("EmployeeId",employeeIds))
NHibernate then generates a query which looks like
NHibernate 然后生成一个查询,看起来像
select * from employee where employeeid in (1, 5, 23463, 32523)
So as you and others have pointed out, it looks like there are going to be times where an index will be used or a table scan will happen, but you can't really determine that until runtime.
因此,正如您和其他人所指出的那样,似乎有时会使用索引或进行表扫描,但直到运行时您才能真正确定。
回答by Tushar Rmesh Saindane
Select EmployeeId From Employee USE(INDEX(EmployeeTypeId))
This query will search using the index you have created. It works for me. Please do a try..
此查询将使用您创建的索引进行搜索。这个对我有用。请试一试..