尽管对有序列进行了索引，为什么我的 SQL Server ORDER BY 很慢？

Question

提问by George

I have an SQL query (generated by LINQ to Entities) which is roughly like the following:

我有一个 SQL 查询（由 LINQ to Entities 生成），大致如下所示：

SELECT * FROM [mydb].[dbo].[employees]
JOIN [mydb].[dbo].[industry]
  ON jobs.industryId = industry.id
JOIN [mydb].[dbo].[state]
  ON jobs.stateId = state.id
JOIN [mydb].[dbo].[positionType]
  ON jobs.positionTypeId = positionType.id
JOIN [mydb].[dbo].[payPer]
  ON jobs.salaryPerId = payPer.id
JOIN [mydb].[dbo].[country]
  ON jobs.countryId = country.id
WHERE countryName = 'US'
ORDER BY startDatetime

The query returns about 1200 rows, which I don't think is a huge amount. Unfortunately it also takes ~16 seconds. Without the ORDER BY, the query takes <1 second.

查询返回大约 1200 行，我认为这不是一个很大的数量。不幸的是，它也需要大约 16 秒。如果没有 ORDER BY，查询需要 <1 秒。

I've used SQL Server Management Studio to put an index on the startDatetime column, and also a clustered index on "cityId, industryId, startDatetime, positionTypeId, payPerId, stateId" (i.e. all of the columns in "jobs" that we use in JOINs and on the column we use ORDER BY on). I already have individual indexes on each of the columns we use in JOINs. Unfortunately this hasn't made the query any faster.

我使用 SQL Server Management Studio 在 startDatetime 列上放置了一个索引，还在“cityId、industryId、startDatetime、positionTypeId、payPerId、stateId”（即我们使用的“jobs”中的所有列）上放置了一个聚集索引JOIN 和在我们使用 ORDER BY 的列上）。我已经在 JOIN 中使用的每个列上都有单独的索引。不幸的是，这并没有使查询更快。

I ran a showplan and got:

我运行了一个展示计划并得到了：

   |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[cityId]))
       |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[stateId]))
       |    |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[industryId]))
       |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[positionTypeId]))
       |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:([mydb].[dbo].[jobs].[salaryPerId]))
       |    |    |    |    |--Sort(ORDER BY:([mydb].[dbo].[jobs].[issueDatetime] ASC))
       |    |    |    |    |    |--Hash Match(Inner Join, HASH:([mydb].[dbo].[currency].[id])=([mydb].[dbo].[jobs].[salaryCurrencyId]))
       |    |    |    |    |         |--Index Scan(OBJECT:([mydb].[dbo].[currency].[IX_currency]))
       |    |    |    |    |         |--Nested Loops(Inner Join, WHERE:([mydb].[dbo].[jobs].[countryId]=[mydb].[dbo].[country].[id]))
       |    |    |    |    |              |--Index Seek(OBJECT:([mydb].[dbo].[country].[IX_country]), SEEK:([mydb].[dbo].[country].[countryName]='US') ORDERED FORWARD)
       |    |    |    |    |              |--Clustered Index Scan(OBJECT:([mydb].[dbo].[jobs].[PK_jobs]))
       |    |    |    |    |--Clustered Index Seek(OBJECT:([mydb].[dbo].[payPer].[PK_payPer]), SEEK:([mydb].[dbo].[payPer].[id]=[mydb].[dbo].[jobs].[salaryPerId]) ORDERED FORWARD)
       |    |    |    |--Clustered Index Seek(OBJECT:([mydb].[dbo].[positionType].[PK_positionType]), SEEK:([mydb].[dbo].[positionType].[id]=[mydb].[dbo].[jobs].[positionTypeId]) ORDERED FORWARD)
       |    |    |--Clustered Index Seek(OBJECT:([mydb].[dbo].[industry].[PK_industry]), SEEK:([mydb].[dbo].[industry].[id]=[mydb].[dbo].[jobs].[industryId]) ORDERED FORWARD)
       |    |--Clustered Index Seek(OBJECT:([mydb].[dbo].[state].[PK_state]), SEEK:([mydb].[dbo].[state].[id]=[mydb].[dbo].[jobs].[stateId]) ORDERED FORWARD)
       |--Clustered Index Seek(OBJECT:([mydb].[dbo].[city].[PK_city]), SEEK:([mydb].[dbo].[city].[id]=[mydb].[dbo].[jobs].[cityId]) ORDERED FORWARD)

The important line seems to be "|--Sort(ORDER BY:([mydb].[dbo].[jobs].[issueDatetime] ASC))" — without any mention of an index on that column.

重要的一行似乎是“|--Sort(ORDER BY:([mydb].[dbo].[jobs].[issueDatetime] ASC))”——没有提到该列的索引。

Why is my ORDER BY making my query so much slower, and how can I speed up my query?

为什么我的 ORDER BY 使我的查询速度如此之慢，我该如何加快查询速度？

Answer 1

采纳答案by Scott Bruns

If your query does not contain an order by then it will return the data in whatever oreder it was found. There is no guarantee that the data will even be returned in the same order when you run the query again.

如果您的查询不包含订单，那么它将以找到的任何订单返回数据。当您再次运行查询时，无法保证数据甚至会以相同的顺序返回。

When you include an order by clause, the dabatase has to build a list of the rows in the correct order and then return the data in that order. This can take a lot of extra processing which translates into extra time.

当您包含 order by 子句时，dbatase 必须以正确的顺序构建行列表，然后按该顺序返回数据。这可能需要大量额外的处理，从而转化为额外的时间。

It probably takes longer to sort a large number of columns, which your query might be returning. At some point you will run out of buffer space and the database will have to start swapping and perfromance will go downhill.

对查询可能返回的大量列进行排序可能需要更长的时间。在某些时候，您将耗尽缓冲区空间，数据库将不得不开始交换，性能将走下坡路。

Try returning less columns (specify the columns you need instead of Select *) and see if the query runs faster.

尝试返回更少的列（指定您需要的列而不是 Select *）并查看查询是否运行得更快。

Answer 2

回答by Remus Rusanu

Because your query projects all the columns (*), it needs 5 columns for the join conditions and has an unselective WHEREclause on what is likely a joined table column, it causes it to hit the Index Tipping Point: the optimizer decides that it is less costly to scan the entire table, filter it and sort it that it would be to range scan the index and then lookup each key in the table to retrieve the needed extra columns (the 5 for the joins and the rest for the *).

因为您的查询投影了所有列 ( *)，它需要 5 列作为连接条件，并且WHERE对可能连接的表列有一个非选择性子句，这会导致它达到索引临界点：优化器决定它的成本更低扫描整个表，对其进行过滤和排序，以进行范围扫描索引，然后查找表中的每个键以检索所需的额外列（5 个用于连接，其余用于连接*）。

A better index to partially cover this query could be:

部分覆盖此查询的更好索引可能是：

CREATE INDEX ... ON .. (countryId, startDatetime);

Jeffrey's suggestion to make the clustered index would cover the query 100% and would definitely improve performance, but changing the clustered index has many side effects. I would start with a non-clustered index as above. Unless they are needed by other queries, you can drop all the other non-clustered indexes you created, they won't help this query.

Jeffrey 建议使用聚集索引将查询 100% 覆盖并肯定会提高性能，但更改聚集索引有很多副作用。我将从上面的非聚集索引开始。除非其他查询需要它们，否则您可以删除您创建的所有其他非聚集索引，它们不会帮助此查询。

Answer 3

回答by Pankaj

You should try below code also

你也应该试试下面的代码

Insert the records into temporary tableWithout using the Order by clause

不使用Order by 子句将记录插入临时表

SELECT * into #temp FROM [mydb].[dbo].[employees]
JOIN [mydb].[dbo].[industry]
  ON jobs.industryId = industry.id
JOIN [mydb].[dbo].[state]
  ON jobs.stateId = state.id
JOIN [mydb].[dbo].[positionType]
  ON jobs.positionTypeId = positionType.id
JOIN [mydb].[dbo].[payPer]
  ON jobs.salaryPerId = payPer.id
JOIN [mydb].[dbo].[country]
  ON jobs.countryId = country.id
WHERE countryName = 'US'

Now run the statement using Order By Clause

现在使用 Order By Clause 运行语句

Select * from #temp ORDER BY startDatetime

Answer 4

回答by Bohemian

News flash: Indexing a column doesn't help make the sort faster.

新闻快讯：索引列无助于加快排序速度。

If you want to make your query A LOT faster reverse the order of your tables. Specifically, list table countryfirst in your joined tables. Reason? The where clause can filter rows from the first table instead of having to make all those joins, thenfiltering the rows.

如果你想让你的查询更快地反转你的表的顺序。具体来说，country在您的连接表中首先列出表。原因？where 子句可以过滤第一个表中的行，而不必进行所有这些连接，然后过滤行。

Answer 5

回答by Jeffrey Hantin

What order are the fields in the clustered index included in? You'll want to put the startDateTimefield first in order for the ORDER BYto match it, or in this case (countryId, startDateTime)up front in that order since you want to select a single countryId(indirectly, via countryName) and then order by startDateTime.

聚集索引中的字段包含的顺序是什么？您需要将startDateTime字段放在第一位以便ORDER BY匹配它，或者在这种情况下(countryId, startDateTime)按该顺序放在前面，因为您要选择单个countryId（间接，通过countryName）然后按排序startDateTime。

尽管对有序列进行了索引，为什么我的 SQL Server ORDER BY 很慢？

提问by George

采纳答案by Scott Bruns

回答by Remus Rusanu

回答by Pankaj

回答by Bohemian

回答by Jeffrey Hantin

相关推荐

最近更新

标签

尽管对有序列进行了索引，为什么我的 SQL Server ORDER BY 很慢？

提问by George

采纳答案by Scott Bruns

回答by Remus Rusanu

回答by Pankaj

回答by Bohemian

回答by Jeffrey Hantin

相关推荐

SQL 列名以数字开头？

SQL 如何生成所有约束脚本

groovy sql eachRow 和 rows 方法

简单的日期时间 sql 查询

相关推荐

最近更新

标签