SQL 为什么我不能简单地添加一个包含所有列的索引?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5447987/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why can't I simply add an index that includes all columns?
提问by Niels Brinch
I have a table in SQL Server database which I want to be able to search and retrieve data from as fast as possible. I don't care about how long time it takes to insert into the table, I am only interested in the speed at which I can get data.
我在 SQL Server 数据库中有一个表,我希望能够尽快搜索和检索数据。我不在乎插入到表中需要多长时间,我只关心获取数据的速度。
The problem is the table is accessed with 20 or more different types of queries. This makes it a tedious task to add an index specially designed for each query. I'm considering instead simply adding an index that includes ALL columns of the table. It's not something you would normally do in "good" database design, so I'm assuming there is some good reason why I shouldn't do it.
问题是该表被 20 个或更多不同类型的查询访问。这使得为每个查询添加专门设计的索引成为一项繁琐的任务。我正在考虑简单地添加一个包含表中所有列的索引。这不是您在“好的”数据库设计中通常会做的事情,所以我假设有一些很好的理由为什么我不应该这样做。
Can anyone tell me why I shouldn't do this?
谁能告诉我为什么我不应该这样做?
UPDATE: I forgot to mention, I also don't care about the size of my database. It's OK that it means my database size will grow larger than it needed to
更新:我忘了提,我也不关心我的数据库的大小。没关系,这意味着我的数据库大小将增长到超过所需的大小
回答by marc_s
First of all, an index in SQL Server can only have at most 900 bytes in its index entry. That alone makes it impossible to have an index with all columns.
首先,SQL Server 中的索引在其索引条目中最多只能有 900 个字节。仅凭这一点就不可能拥有所有列的索引。
Most of all: such an index makes no sense at all. What are you trying to achieve??
最重要的是:这样的索引根本没有意义。你想达到什么目的??
Consider this: if you have an index on (LastName, FirstName, Street, City)
, that index will notbe able to be used to speed up queries on
考虑一下:如果您在 上有一个索引(LastName, FirstName, Street, City)
,则该索引将无法用于加速对上的查询
FirstName
aloneCity
Street
FirstName
独自的City
Street
That index would be useful for searches on
该索引对搜索有用
(LastName)
, or(LastName, FirstName)
, or(LastName, FirstName, Street)
, or(LastName, FirstName, Street, City)
(LastName)
, 或者(LastName, FirstName)
, 或者(LastName, FirstName, Street)
, 或者(LastName, FirstName, Street, City)
but really nothing else - certainly not if you search for just Street
or just City
!
但真的没有别的 - 当然不是,如果你只是Street
或只是搜索City
!
The order of the columns in your index makes quite a difference, and the query optimizer can't just use any column somewhere in the middle of an index for lookups.
索引中列的顺序有很大的不同,查询优化器不能只使用索引中间某处的任何列进行查找。
Consider your phone book: it's order probably by LastName, FirstName, maybe Street. So does that indexing help you find all "Joe's" in your city? All people living on "Main Street" ?? No - you can lookup by LastName first - then you get more specific inside that set of data. Just having an index over everything doesn't help speed up searching for all columns at all.
考虑一下您的电话簿:它的顺序可能是姓氏、名字,也许是街道。那么该索引是否可以帮助您找到您所在城市中的所有“Joe's”?所有住在“大街”上的人??不 - 您可以先通过姓氏查找 - 然后您可以在该组数据中获得更具体的信息。只要有高于一切的指标并不利于加快搜索所有列在所有。
If you want to be able to search by Street
- you need to add a separate index on (Street)
(and possibly another column or two that make sense).
如果您希望能够搜索Street
- 您需要添加一个单独的索引(Street)
(可能还有另一列或两列有意义)。
If you want to be able to search by Occupation
or whatever else - you need another specific index for that.
如果您希望能够通过Occupation
或其他方式进行搜索- 您需要另一个特定的索引。
Just because your column exists in an index doesn't mean that'll speed up all searches for that column!
仅仅因为您的列存在于索引中并不意味着这将加快对该列的所有搜索!
The main rule is: use as few indices as possible - too many indices can be even worse for a system than having no indices at all.... build your system, monitor its performance, and find those queries that cost the most - then optimize these, e.g. by adding indices.
主要规则是:使用尽可能少的索引 - 对于系统而言,索引过多可能比根本没有索引更糟糕......构建您的系统,监控其性能,并找到那些成本最高的查询 - 然后优化这些,例如通过添加索引。
Don't just blindly index every column just because you can - this is a guarantee for lousy system performance - any index also requires maintenance and upkeep, so the more indices you have, the more your INSERT, UPDATE and DELETE operations will suffer (get slower) since all those indices need to be updated.
不要仅仅因为你可以就盲目地为每一列建立索引——这是糟糕系统性能的保证——任何索引也需要维护和保养,所以你拥有的索引越多,你的 INSERT、UPDATE 和 DELETE 操作就会受到更多的影响(得到较慢),因为所有这些索引都需要更新。
回答by Markus Winand
You are having a fundamental misunderstanding how indexes work.
您对索引的工作方式存在根本性的误解。
Read this explanation "how multi-column indexes work".
阅读这个解释“多列索引是如何工作的”。
The next question you might have is why not creating one index per column--but that's also a dead-end if you try to reach top select performance.
您可能会遇到的下一个问题是为什么不为每列创建一个索引——但如果您试图达到最佳选择性能,这也是一个死胡同。
You might feel that it is a tedioustask, but I would say it's a requiredtask to index carefully. Sloppy indexing strikes back, as in this example.
您可能会觉得这是一项乏味的任务,但我会说这是一项需要仔细索引的任务。草率索引反击,如本例所示。
Note: I am strongly convinced that proper indexing pays off and I know that many people are having the very same questions you have. That's why I'm writing a the a free book about it. The links above refer the pages that might help you to answer your question. However, you might also want to read it from the beginning.
注意:我坚信正确的索引会带来回报,而且我知道很多人都遇到了与您相同的问题。这就是为什么我要写一本关于它的免费书。上面的链接是指可能帮助您回答问题的页面。但是,您可能还想从头开始阅读。
回答by RichardTheKiwi
If this is a data warehouse type operation where queries are highly optimized for READ queries, and if you have 20 ways of dissecting the data, e.g.
如果这是一个数据仓库类型的操作,其中查询针对 READ 查询进行了高度优化,并且您有 20 种分析数据的方法,例如
WHERE clause involves..
WHERE 子句涉及..
Q1: status, type, customer
Q2: price, customer, band
Q3: sale_month, band, type, status
Q4: customer
etc
And you absolutely have plenty of fast storage space to burn, then by all meanscreate an index for EVERY single column, separately. So a 20-column table will have 20 indexes, one for each individual column. I could probably say to ignore bit columns or low cardinality columns, but since we're going so far, why bother (with that admonition). They will just sit there and churn the WRITE time, but if you don't care about that part of the picture, then we're all good.
而且你绝对有足够的快速存储空间来刻录,然后一定要为每一列分别创建一个索引。所以一个 20 列的表将有 20 个索引,每个单独的列一个。我可能会说忽略位列或低基数列,但既然我们已经走了这么远,为什么还要麻烦(有这样的警告)。他们只会坐在那里搅动 WRITE 时间,但如果您不关心图片的那部分,那么我们都很好。
Analyze your 20 queries, and if you have hot queries (the hottest ones) that still won't go any faster, plan it using SSMS (press Ctrl-L) with one query in the query window. It will tell you what index can help that queries - just create it; create them all, fully remembering that this adds again to the write cost, backup file size, db maintenance time etc.
分析您的 20 个查询,如果您有热门查询(最热门的查询)但仍然不会更快,请使用 SSMS(按 Ctrl-L)和查询窗口中的一个查询来规划它。它会告诉您哪些索引可以帮助查询 - 只需创建它;创建所有这些,完全记住这会再次增加写入成本、备份文件大小、数据库维护时间等。
回答by zerkms
I'm considering instead simply adding an index that includes ALL columns of the table.
我正在考虑简单地添加一个包含表中所有列的索引。
This is always a bad idea. Indexes in database is not some sort of pixie dust that works magically. You have toanalyze your queries and according to what and how is being queried - append indexes.
这总是一个坏主意。数据库中的索引不是某种神奇的小精灵。您必须分析您的查询并根据查询的内容和方式 - 附加索引。
It is not as simple as "add everything to index and have a nap"
不是“将所有内容添加到索引并小睡”那么简单
回答by Josh Smeaton
...if you add an index that contains all columns, and a query was actually able to use that index, it would scan it in the order of the primary key. Which means hitting nearly every record. Average search time would be O(n/2).. the same as hitting the actual database.
...如果您添加一个包含所有列的索引,并且查询实际上能够使用该索引,它将按主键的顺序对其进行扫描。这意味着几乎达到所有记录。平均搜索时间将是 O(n/2).. 与点击实际数据库相同。
You need to read a bitlot about indexes.
你需要阅读有点很多关于索引。
It might help if you consider an index on a table to be a bit like a Dictionary in C#.
如果您认为表上的索引有点像 C# 中的字典,这可能会有所帮助。
var nameIndex = new Dictionary<String, List<int>>();
That means that the name column is indexed, and will return a list of primary keys.
这意味着 name 列被索引,并将返回一个主键列表。
var nameOccupationIndex = new Dictionary<String, List<Dictionary<String, List<int>>>>();
That means that the name column + occupation columns are indexed. Now imagine the index contained 10 different columns, nested so far deep it contains every single row in your table.
这意味着名称列 + 职业列被索引。现在想象一下索引包含 10 个不同的列,嵌套很深,它包含表中的每一行。
This isn't exactly how it works mind you. But it should give you an idea of how indexes could work if implemented in C#. What you need to do is create indexes based on one or two keys that are queried on extensively, so that the index is more useful than scanning the entire table.
这并不完全是它的工作原理。但它应该让您了解如果在 C# 中实现索引如何工作。您需要做的是基于一个或两个广泛查询的键创建索引,这样索引比扫描整个表更有用。
回答by ewanm89
1) size, an index essentially builds a copy of the data in that column some easily searchable structure, like a binary tree (I don't know SQL Server specifcs). 2) You mentioned speed, index structures are slower to add to.
1) 大小,索引本质上是在该列中构建数据的副本,一些易于搜索的结构,如二叉树(我不知道 SQL Server 规范)。2)您提到了速度,索引结构的添加速度较慢。
回答by compound eye
I think the questioner is asking
我认为提问者在问
'why can't I make an index like':
'为什么我不能做一个索引':
create index index_name
on table_name
(
*
)
The problems with that have been addressed.
与此有关的问题已得到解决。
But given it sounds like they are using MS sql server. It's useful to understand that you can include nonkey columns in an index so they the values of those columns are available for retrieval from the index, but not to be used as selection criteria :
但鉴于听起来他们正在使用 MS sql server。了解您可以在索引中包含非键列很有用,因此这些列的值可用于从索引中检索,但不能用作选择标准:
create index index_name
on table_name
(
foreign_key
)
include (a,b,c,d) -- every column except foreign key
I created two tables with a million identical rows
我创建了两个包含一百万行相同行的表
I indexed table A like this
我像这样索引了表 A
create nonclustered index index_name_A
on A
(
foreign_key -- this is a guid
)
and table B like this
和表 B 像这样
create nonclustered index index_name_B
on B
(
foreign_key -- this is a guid
)
include (id,a,b,c,d) -- ( every key except foreign key)
no surprise, table A was slightly faster to insert to.
毫不奇怪,表 A 的插入速度略快。
but when I and ran these this queries
但是当我运行这些查询时
select * from A where foreign_key = @guid
select * from B where foreign_key = @guid
On table A, sql server didn't even use the index, it did a table scan, and complained about a missing index including id,a,b,c,d
在表 A 上,sql server 甚至没有使用索引,它进行了表扫描,并抱怨缺少包括 id、a、b、c、d 在内的索引
On table B, the query was over 50 times faster with much less io
在表 B 上,查询速度提高了 50 多倍,但 io 少得多
forcing the query on A to use the index didn't make it any faster
强制 A 上的查询使用索引并没有使它更快
select * from A where foreign_key = @guid
select * from A with (index(index_name_A)) where foreign_key = @guid
回答by Albin Sunnanbo
That index would just be identical to your table (possibly sorted in another order).
It won't speed up your queries.
该索引将与您的表完全相同(可能按其他顺序排序)。
它不会加快您的查询速度。