SQL Server 2008:按日期时间排序太慢

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1716798/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:23:55  来源:igfitidea点击:

SQL Server 2008: Ordering by datetime is too slow

sqlsql-serversql-server-2008sql-order-by

提问by silent

My table (SQL Server 2008) has 1 million+ records, when I try to order records by datetime, it takes 1 second, but when I order by ID (int), it only takes about 0.1 second.

我的表 (SQL Server 2008) 有 100 万条以上的记录,当我尝试按日期时间对记录进行排序时,需要 1 秒,但是当我按 ID (int) 进行排序时,只需要大约 0.1 秒。

Is there any way to improve the efficiency? (I already added the datetime column to the index)

有什么办法可以提高效率吗?(我已经将日期时间列添加到索引中)

回答by Quassnoi

Ordering by idprobably uses a clustered index scan while ordering by datetimeuses either sorting or index lookup.

排序依据id可能使用聚集索引扫描,而排序依据datetime使用排序或索引查找。

Both these methods are more slow than a clustered index scan.

这两种方法都比聚集索引扫描慢。

If your table is clustered by id, basically it means it is already sorted. The records are contained in a B+Treewhich has a linked list linking the pages in idorder. The engine should just traverse the linked list to get the records ordered by id.

如果您的表按 聚簇id,基本上这意味着它已经排序。记录包含在B+Tree具有按id顺序链接页面的链表的 a 中。引擎应该只遍历链表以获取按 排序的记录id

If the ids were inserted in sequential order, this means that the physical order of the rows will match the logical order and the clustered index scan will be yet faster.

如果ids 是按顺序插入的,这意味着行的物理顺序将与逻辑顺序匹配,并且聚集索引扫描会更快。

If you want your records to be ordered by datetime, there are two options:

如果您希望您的记录按 排序datetime,有两种选择:

  • Take all records from the table and sort them. Slowness is obvious.
  • Use the index on datetime. The index is stored in a separate space of the disk, this means the engine needs to shuttle between the index pages and table pages in a nested loop. It is more slow too.
  • 从表中取出所有记录并对其进行排序。缓慢是显而易见的。
  • 在 上使用索引datetime。索引存储在磁盘的单独空间中,这意味着引擎需要在嵌套循环中在索引页和表页之间穿梭。它也更慢。

To improve the ordering, you can create a separate covering index on datetime:

为了改进排序,您可以在 上创建单独的覆盖索引datetime

CREATE INDEX ix_mytable_datetime ON mytable (datetime) INCLUDE (field1, field2, …)

, and include all columns you use in your query into that index.

,并将您在查询中使用的所有列包含到该索引中。

This index is like a shadow copy of your table but with data sorted in different order.

这个索引就像你的表的影子副本,但数据按不同的顺序排序。

This will allow to get rid of the key lookups (since the index contains all data) which will make ordering by datetimeas fast as that on id.

这将允许摆脱键查找(因为索引包含所有数据),这将使排序datetime与 on 一样快id

Update:

更新:

A fresh blog post on this problem:

关于此问题的新博客文章:

回答by Remus Rusanu

To honor the ORDER BY the engine has two alternatives:

为了遵守 ORDER BY 引擎有两种选择:

  • scan the rows using an index that offers the order requested
  • sort the rows
  • 使用提供请求顺序的索引扫描行
  • 对行进行排序

First option is fast, second is slow. The problem is that in order to be used, the index has to be a coveringindex. Meaning it contains all the columns in the SELECT projection list and all the columns used in WHERE clauses (at a minimum). If the index is not covering then the engine would have to lookup the clustered index (ie the 'table') for each row, in order to retrieve the values of the needed columns. This constant lookup of values is expensive, and there is a tipping point when the engine will (rightfully) decide is more efficient to just scan the clustered index and sort the result, in effect ignoring your non-clustered index. For details, see The Tipping Point Query Answers.

第一个选项快,第二个选项慢。问题是为了使用,索引必须是覆盖索引。这意味着它包含 SELECT 投影列表中的所有列以及 WHERE 子句中使用的所有列(至少)。如果索引没有覆盖,那么引擎将不得不为每一行查找聚集索引(即“表”),以检索所需列的值。这种对值的不断查找是昂贵的,并且当引擎将(正确地)决定仅扫描聚集索引并对结果进行排序时会更有效,实际上忽略了非聚集索引,因此存在一个临界点。有关详细信息,请参阅引爆点查询答案

Consider the following three queries:

考虑以下三个查询:

SELECT dateColumn FROM table ORDER BY dateColumn
SELECT * FROM table ORDER BY dateColumn
SELECT someColumn FROM table ORDER BY dateColumn

The first one will be be using a non-clustered index on dateColumn. But a the second one will not be using an index on dateColumn, will likely choose a scan and sort instead for 1M rows. On the other hand the third query can benefit from an index on Table(dateColumn) INCLUDE (someColumn).

第一个将在 dateColumn 上使用非聚集索引。但是第二个不会在 dateColumn 上使用索引,可能会选择扫描和排序来代替 1M 行。另一方面,第三个查询可以从 上的索引中受益Table(dateColumn) INCLUDE (someColumn)

This topic is covered at large on MSDN see Index Design Basics, General Index Design Guidelines, Nonclustered Index Design Guidelinesor How To: Optimize SQL Indexes.

MSDN 上全面介绍了该主题,请参阅索引设计基础通用索引设计指南非聚集索引设计指南如何:优化 SQL 索引

Ultimately, the most important choice of your table design is the clustered index you use. Almost always the primary key (usually an auto incremented ID) is left as the clustered index, a decision that benefits only certain OLTP loads.

最终,您的表设计最重要的选择是您使用的聚集索引。几乎总是将主键(通常是自动递增的 ID)作为聚集索引,这一决定仅对某些 OLTP 负载有利。

And finally, a rather obvious question: Why in the world would you order 1 million rows?? You can't possibly display them, can you? Explaining a little bit more about your use case might help us find a better answer for you.

最后,一个相当明显的问题:你到底为什么要订购 100 万行?你不可能显示它们,是吗?多解释一下您的用例可能会帮助我们为您找到更好的答案。

回答by Mark Dickinson

Add the date time to a new index, adding it to the id one will still not help much.

将日期时间添加到新索引中,将其添加到 id one 仍然没有太大帮助。

回答by Nestor

Could it be that there is an index for your int column but not for your datetime column? Look at the execution plan.

可能是您的 int 列有索引,但您的 datetime 列没有索引?看执行计划。

回答by j.a.estevan

Have you added the DateTime field to "the" index or to an exclusive index? Are you filtering your selection by another field and the DateTime or only this one?

您是否将 DateTime 字段添加到“the”索引或独占索引?您是按另一个字段和日期时间过滤您的选择还是仅按此一个?

You must have an index with all the fields that you are filtering and preferably in the same order to optmize performance.

您必须拥有一个包含您要过滤的所有字段的索引,并且最好按照相同的顺序来优化性能。

回答by badbod99

If your datetime field contains a lot of distinct values and those values rarely change, define a clustered index on the datetime field, this will sort the actual data by the datetime value. See http://msdn.microsoft.com/en-us/library/aa933131(SQL.80).aspxfor using clustered indexes.

如果您的日期时间字段包含许多不同的值并且这些值很少更改,请在日期时间字段上定义一个聚集索引,这将按日期时间值对实际数据进行排序。有关使用聚集索引的信息,请参阅http://msdn.microsoft.com/en-us/library/aa933131(SQL.80).aspx

This will make you int searches slower though, as they will be relegated to using a non-clustered index.

这将使您的 int 搜索速度变慢,因为它们将被降级为使用非聚集索引。

回答by Dani

maybe if you store datatime as a int but it would take time converting each time you store or get data. (common technique used to store staff like ip address and have a faster seek times)

也许如果您将 datatime 存储为 int 但每次存储或获取数据时都需要时间进行转换。(用于存储诸如 ip 地址之类的人员的常用技术并具有更快的查找时间)

you should check in your server how it stores datetime, b/c it your server already stores it as int or bigint.. it will not change anything....

你应该检查你的服务器它是如何存储日期时间的,b/c 你的服务器已经将它存储为 int 或 bigint ......它不会改变任何东西......