database 多少数据库索引太多了?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/141232/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 06:58:39  来源:igfitidea点击:

How many database indexes is too many?

databaseoracledatabase-design

提问by Eli Courtwright

I'm working on a project with a rather large Oracle database (although my question applies equally well to other databases). We have a web interface which allows users to search on almost any possible combination of fields.

我正在处理一个具有相当大 Oracle 数据库的项目(尽管我的问题同样适用于其他数据库)。我们有一个网络界面,允许用户搜索几乎任何可能的字段组合。

To make these searches go fast, we're adding indexes to the fields and combinations of fields on which we believe users will commonly search. However, since we don't really know how our customers will use this software, it's hard to tell which indexes to create.

为了加快这些搜索的速度,我们为我们认为用户通常会搜索的字段和字段组合添加了索引。但是,由于我们并不真正了解客户将如何使用该软件,因此很难确定要创建哪些索引。

Space isn't a concern; we have a 4 terabyte RAID drive of which we are using only a small fraction. However, I'm worried about the possible performance penalties of having too many indexes. Because those indexes need to be updated every time a row is added, deleted, or modified, I imagine it'd be a bad idea to have dozens of indexes on a single table.

空间不是问题;我们有一个 4 TB 的 RAID 驱动器,我们只使用了其中的一小部分。但是,我担心索引过多可能会导致性能下降。因为每次添加、删除或修改行时都需要更新这些索引,所以我想在单个表上拥有数十个索引是个坏主意。

So how many indexes is considered too many? 10? 25? 50? Or should I just cover the really, really common and obvious cases and ignore everything else?

那么多少索引被认为太多了?10?25?50?或者我应该只涵盖真正、真正常见和明显的情况而忽略其他一切?

采纳答案by cagcowboy

It depends on the operations that occur on the table.

这取决于表上发生的操作。

If there's lots of SELECTs and very few changes, index all you like.... these will (potentially) speed the SELECT statements up.

如果有很多 SELECT 和很少的变化,索引所有你喜欢的......这些将(可能)加速 SELECT 语句。

If the table is heavily hit by UPDATEs, INSERTs + DELETEs ... these will be very slow with lots of indexes since they all need to be modified each time one of these operations takes place

如果表受到 UPDATE、INSERTs + DELETEs 的严重影响......这些索引会非常慢,因为每次发生这些操作之一时它们都需要修改

Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index.

话虽如此,您显然可以将许多毫无意义的索引添加到一个不会做任何事情的表中。将 B 树索引添加到具有 2 个不同值的列将毫无意义,因为它在查找数据方面没有添加任何内容。列中的值越独特,它就越能从索引中受益。

回答by Sklivvz

I usually proceed like this.

我通常是这样进行的。

  1. Get a log of the realqueries run on the data on a typical day.
  2. Add indexes so the most important queries hit the indexes in their execution plan.
  3. Try to avoid indexing fields that have a lot of updates or inserts
  4. After a few indexes, get a new log and repeat.
  1. 获取在典型日期对数据运行的真实查询的日志。
  2. 添加索引,以便最重要的查询命中其执行计划中的索引。
  3. 尽量避免索引有大量更新或插入的字段
  4. 几个索引后,得到一个新的日志并重复。

As with all any optimization, I stop when the requested performance is reached (this obviously implies that point 0. would be getting specific performance requirements).

与所有优化一样,当达到要求的性能时我会停止(这显然意味着点 0. 将获得特定的性能要求)。

回答by Mike McAllister

Everyone else has been giving you great advice. I have an added suggestion for you as you move forward. At some point you have to make a decision as to your best indexing strategy. In the end though, the best PLANNED indexing strategy can still end up creating indexes that don't end up getting used. One strategy that lets you find indexes that aren't used is to monitor index usage. You do this as follows:-

其他人都给了你很好的建议。当你继续前进时,我有一个额外的建议给你。在某些时候,您必须决定最佳索引策略。但最终,最好的 PLANNED 索引策略仍然可以创建最终不会被使用的索引。让您找到未使用的索引的一种策略是监视索引使用情况。您可以按以下方式执行此操作:-

alter index my_index_name monitoring usage;

You can then monitor whether the index is used or not from that point forward by querying v$object_usage. Information on this can be found in the Oracle? Database Administrator's Guide.

然后,您可以通过查询 v$object_usage 来监视索引是否从那时起使用。这个资料在oracle上能查到吗?数据库管理员指南

Just remember that if you have a warehousing strategy of dropping indexes before updating a table, then recreating them, you will have to set the index up for monitoring again, and you'll lose any monitoring history for that index.

请记住,如果您有在更新表之前删除索引的仓储策略,然后重新创建它们,您将不得不再次设置索引以进行监控,并且您将丢失该索引的任何监控历史记录。

回答by David Aldridge

In data warehousing it is very common to have a high number of indexes. I have worked with fact tables having two hundred columns and 190 of them indexed.

在数据仓库中,拥有大量索引是很常见的。我曾使用具有 200 列和 190 列索引的事实表。

Although there is an overhead to this it must be understood in the context that in a data warehouse we generally only insert a row once, we never update it, but it can then participate in thousands of SELECT queries which might benefit from indexing on any of the columns.

虽然这会产生开销,但必须在上下文中理解,在数据仓库中,我们通常只插入一次行,我们从不更新它,但是它可以参与数千个 SELECT 查询,这些查询可能会受益于对任何一个进行索引列。

For maximum flexibility a data warehouse generally uses single column bitmap indexes except on high cardinality columns, where (compressed) btree indexes can be used.

为了获得最大的灵活性,数据仓库通常使用单列位图索引,高基数列除外,其中可以使用(压缩)btree 索引。

The overhead on index maintenance is mostly associated with the expense of writing to a great many blocks and the block splits as new rows are added with values that are "in the middle" of existing value ranges for that column. This can be mitigated by partitioning and having the new data loads aligned with the partitioning scheme, and by using direct path inserts.

索引维护的开销主要与写入大量块的费用有关,并且随着新行添加的值位于该列现有值范围的“中间”,块会拆分。这可以通过分区并使新数据加载与分区方案保持一致,以及使用直接路径插入来缓解。

To address your question more directly, I think it is probably fine to index the obvious at first, but do not be afraid of adding more indexes on if the queries against the table would benefit.

为了更直接地解决您的问题,我认为首先对显而易见的内容进行索引可能很好,但是如果对表的查询会受益,请不要害怕添加更多索引。

回答by Josef

In a paraphrase of Einsteinabout simplicity, add as many indexes as you need and no more.

爱因斯坦关于简单性的解释,根据需要添加尽可能多的索引,而不是更多。

Seriously, however, every index you add requires maintenance whenever data is added to the table. On tables that are primarily read only, lots of indexes are a good thing. On tables that are highly dynamic, fewer is better.

然而,严肃地说,每当向表中添加数据时,您添加的每个索引都需要维护。在主要是只读的表上,大量的索引是一件好事。在高度动态的表上,越少越好。

My advice is to cover the common and obvious cases and then, as you encounter issues where you need more speed in getting data from specific tables, evaluate and add indices at that point.

我的建议是涵盖常见和明显的情况,然后,当您遇到需要更快地从特定表中获取数据的问题时,请在此时评估和添加索引。

Also, it's a good idea to re-evaluate your indexing schemes every few months, just to see if there is anything new that needs indexing or any indices that you've created that aren't being used for anything and should be gotten rid of.

此外,每隔几个月重新评估您的索引方案是一个好主意,看看是否有任何新的需要索引或您创建的任何索引没有被用于任何东西并且应该被删除.

回答by WW.

In addition to the points everyone else has raised, the Cost Based Optimizer incurs a cost when creating a plan for an SQL statement if there are more indexes because there are more combinations for it to consider. You can reduce this by correctly using bind variables so that SQL statements stay in the SQL cache. Oracle can then do a soft parse and re-use the plan it found last time.

除了其他人提出的要点之外,如果有更多索引,基于成本的优化器在为 SQL 语句创建计划时会产生成本,因为它需要考虑更多的组合。您可以通过正确使用绑定变量来减少这种情况,以便 SQL 语句保留在 SQL 缓存中。然后 Oracle 可以进行软解析并重新使用它上次找到的计划。

As always, nothing is simple. If there are skewed columns and histograms involved then this can be a bad idea.

一如既往,没有什么是简单的。如果涉及倾斜的列和直方图,那么这可能是一个坏主意。

In our web applications we tend to limit the combinations of searches that we allow. Otherwise you would have to test literally every combination for performance to ensure you did not have a lurking problem that someone will find one day. We have also implemented resource limits to stop this causing issues elsewhere in the application should something go wrong.

在我们的 Web 应用程序中,我们倾向于限制我们允许的搜索组合。否则,您将不得不从字面上测试每种组合的性能,以确保您没有有一天会发现的潜在问题。我们还实施了资源限制,以防止在出现问题时在应用程序的其他地方造成问题。

回答by nightcoder

I made some simple tests on my real project and real MySql database. I already answered in this topic: What is the cost of indexing multiple db columns?

我对我的真实项目和真实的 MySql 数据库做了一些简单的测试。我已经在这个主题中回答了:索引多个 db 列的成本是多少?

But I think it will be better if I quote it here:

但我认为如果我在这里引用它会更好:

I made some simple tests using my real project and real MySql database.

My results are: adding average index (1-3 columns in an index) to a table - makes inserts slower by 2.1%. So, if you add 20 indexes, your inserts will be slower by 40-50%. But your selects will be 10-100 times faster.

So is it ok to add many indexes? - It depends :) I gave you my results - You decide!

我使用我的真实项目和真实的 MySql 数据库做了一些简单的测试。

我的结果是:将平均索引(索引中的 1-3 列)添加到表 - 使插入速度降低 2.1%。因此,如果添加 20 个索引,插入速度将降低 40-50%。但是您的选择会快 10-100 倍。

那么可以添加多个索引吗?- 这取决于:) 我给了你我的结果 - 你决定!

回答by scotta

There's no static answer in my opinion, this sort of thing falls under 'performance tuning'.

我认为没有固定的答案,这类事情属于“性能调整”。

It could be that everything your app does is looked up by a primary key, or it could be the oposite in that queries are done over unristricted combinations of fields and any one in particular could be used at any given time.

可能是您的应用程序所做的一切都通过主键进行查找,或者相反,查询是在不受限制的字段组合上完成的,并且可以在任何给定时间使用任何特定的组合。

Beyond just indexing, there's reogranizing your DB to include calculated search fields, splitting tables, etc - it's really dependant on your load shapes and query parameters, how much/what data 'really' needs to be retruend by a query.

除了索引之外,还有重新组织您的数据库以包括计算的搜索字段、拆分表等 - 这实际上取决于您的负载形状和查询参数,查询“真正”需要返回多少/什么数据。

If your entire DB is fronted by stored-procedure facades turning becomes a bit easier, as you don't have to wory about every ad-hoc query. Or you may have a deep understanding of the kind of queries that will hit your DB, and can limit the tuning to those.

如果您的整个数据库都以存储过程外观为前端,那么转向会变得更容易一些,因为您不必担心每个临时查询。或者,您可能对会影响您的数据库的查询类型有深入的了解,并且可以将调整限制在这些查询上。

For SQL Server I've found the Database Engine Tuning advisor usefull - you set up 'typical' workloads and it can make recommendations about adding/removing indexes and statistics. I'm sure other DBs have similar tools, either 'offical' or third party.

对于 SQL Server,我发现数据库引擎优化顾问很有用 - 您设置“典型”工作负载,它可以提出有关添加/删除索引和统计信息的建议。我确信其他数据库有类似的工具,无论是“官方”还是第三方。

回答by Moshe

This really is a more theoretical questions than practical. Indexes impact on your performance depends on the hardware you have, the version of Oracle, index types, etc. Yesterday I heard Oracle announced a dedicated storage, made by HP, which is supposed to perform 10 times faster with 11g database. As for your case, there can be several solutions: 1. Have a large amount of indexes (>20) and rebuild them daily (nightly). This would be especially useful if the table gets thousands of updates/deletes daily. 2. Partition your table (if that applies your data model). 3. Use a separate table for new/updated data, and run a nightly process which combines the data together. This would require a change in your application logic. 4. Switch to IOT (index organized table), if your data support this.

这确实是一个理论性的问题而不是实践性的问题。索引对您的性能的影响取决于您拥有的硬件、Oracle 的版本、索引类型等。昨天我听说 Oracle 宣布了一种由 HP 制造的专用存储,它的性能应该是 11g 数据库的 10 倍。对于您的情况,可以有几种解决方案: 1. 拥有大量索引(> 20)并每天(每晚)重建它们。如果表每天有数千次更新/删除,这将特别有用。2. 对表进行分区(如果这适用于您的数据模型)。3. 为新的/更新的数据使用单独的表,并运行将数据组合在一起的夜间进程。这将需要更改您的应用程序逻辑。4. 切换到 IOT(索引组织表),如果您的数据支持这一点。

Of course there might be many more solutions for such case. My first suggestion to you, would be to clone the DB to a development environment, and run some stress testing against it.

当然,这种情况可能有更多的解决方案。我对您的第一个建议是将数据库克隆到开发环境,并对其进行一些压力测试。

回答by Orion Adrian

Ultimately how many indexes you need depend on the behavior of your applications that ride on top of your database server.

最终,您需要多少索引取决于运行在数据库服务器之上的应用程序的行为。

In general the more inserting you do the more painful your indexes become. Each time you do an insert, all the indexes that include that table have to be updated.

一般来说,插入的次数越多,索引就越痛苦。每次执行插入操作时,都必须更新包含该表的所有索引。

Now if your application has a decent amount of reading, or even more so if it's almost all reading, then indexes are the way to go as there will be major performance improvements for very little cost.

现在,如果您的应用程序有相当数量的读取,或者甚至更多,如果它几乎全部读取,那么索引是要走的路,因为将以很少的成本获得重大的性能改进。