database SQL 'ORDER BY' 缓慢
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/707117/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL 'ORDER BY' slowness
提问by Kapsh
Is it true that ORDER BY
is generally pretty slow? I am trying to run some sql statements where the WHERE
clause is pretty simple, but then I am trying an ORDER BY
on a VARCHAR(50)
indexed column.
是真的,这ORDER BY
通常很慢吗?我试图运行一些SQL语句,其中WHERE
条款是非常简单的,但后来我想了ORDER BY
一个VARCHAR(50)
索引列。
I need to sort alphabetically for display reasons. I figured that getting the database to do it for me is the most efficient.
出于显示原因,我需要按字母顺序排序。我认为让数据库为我做这件事是最有效的。
At this point, I am looking to either
在这一点上,我正在寻找
- optimize the sql query
- sort the result set in code
- 优化sql查询
- 对代码中的结果集进行排序
Here is the actual query I am trying to run:
这是我尝试运行的实际查询:
// B.SYNTAX is a TEXT/CLOB field
// Indexes on NAME, MODULENAME. PREVIOUS is a CHAR(1) with no index
"SELECT A.NAME, B.SYNTAX, B.DESCRIPTION, A.RATE1, A.RATE2,
A.RATE3, A.STARTDATE, A.ENDDATE, A.HIDE, A.CATEGORYNAME
FROM A, B WHERE A.MODULENAME='"+loadedModuleName+"'
AND A.NAME = B.NAME AND (A.PREVIOUS<>'N' OR A.PREVIOUS IS NULL)
ORDER BY A.NAME"
The size of table A is ~2000 rows and B is about ~500.
表 A 的大小约为 2000 行,B 约为 500。
I should probably also mention that I cannot do much database specific optimization since we support multiple databases. Also, the application is deployed at a customer site.
我可能还应该提到,由于我们支持多个数据库,因此我无法对数据库进行太多优化。此外,该应用程序部署在客户站点。
I am expecting hundreds of records to be returned (less than 1000).
我期望返回数百条记录(少于 1000 条)。
What would you do? Any tips are appreciated. Thanks.
你会怎么办?任何提示表示赞赏。谢谢。
采纳答案by Kapsh
I did some performance testing last night on a more production-type database (not the developer one) and here is what I found:
我昨晚在一个更生产类型的数据库(不是开发人员的)上做了一些性能测试,这是我发现的:
Total rows in table A: 13000
表 A 中的总行数:13000
Total rows in table B: 5000
表 B 中的总行数:5000
Rows returned by join query : 5000
连接查询返回的行数:5000
Time taken if using with ORDER BY clause: ~5.422 seconds
与 ORDER BY 子句一起使用所用的时间:~5.422 秒
Time taken if not using ORDER BY clause: ~5.345 seconds.
如果不使用 ORDER BY 子句所花费的时间:~5.345 秒。
So it looked like the ORDER BY wasnt making much of a difference. (I am okay with the few milliseconds added).
所以看起来 ORDER BY 并没有产生太大的不同。(我对添加的几毫秒没意见)。
I also tested by setting all B.SYNTAX values to NULL to make sure that it wasnt just the network latency with transferring so much data.
我还通过将所有 B.SYNTAX 值设置为 NULL 来进行测试,以确保它不仅仅是传输如此多数据的网络延迟。
Now I removed the B.SYNTAX from the SELECT clause and the query took only 0.8 seconds!
现在我从 SELECT 子句中删除了 B.SYNTAX,查询只用了 0.8 秒!
So it seems that the whole CLOB column is the bottleneck. Which doesnt mean that I have gotten the solution to making this query faster, but at least I wont spend time writing a sorting algorithm.
所以看起来整个CLOB列是瓶颈。这并不意味着我已经得到了使这个查询更快的解决方案,但至少我不会花时间编写排序算法。
Thanks to all who replied. I learned quite a bit and it led me to try a few different things out.
感谢所有回答的人。我学到了很多东西,这让我尝试了一些不同的东西。
回答by Mark Brittingham
Order by on an indexed field should notbe slow as it can pull the data in index order. You might want to put up information about the database structure (the DDL) and the actual query so people can take a look.
索引字段上的排序不应该很慢,因为它可以按索引顺序拉取数据。您可能希望提供有关数据库结构 (DDL) 和实际查询的信息,以便人们查看。
You absolutely should use a SQL sort rather than a code sort where possible so you are on the right track there.
您绝对应该在可能的情况下使用 SQL 排序而不是代码排序,以便您在正确的轨道上。
UPDATE:Ok, a few things. First, you should not use the " +loadedModuleName+" construct as it makes each query unique and screws up the optimizer. Use a parameter. Second, your Order by clause is ambiguous as to whether it is table A or B - make this explicit and choose the table with the Index (even if both have indices, make it explicit). Finally, your "Previous" field can still be indexed even as a char(1). I would do everything but the last suggested index, test speed and, if still slow, go for the index and check again.
更新:好的,有几件事。首先,您不应该使用“+loadedModuleName+”结构,因为它会使每个查询都变得唯一并搞砸优化器。使用参数。其次,您的 Order by 子句对于它是表 A 还是表 B 是不明确的 - 明确这一点并选择带有索引的表(即使两者都有索引,也要明确表示)。最后,即使作为 char(1),您的“上一个”字段仍然可以被索引。除了最后一个建议的索引、测试速度之外,我会做所有事情,如果仍然很慢,就去寻找索引并再次检查。
UPDATESo you'll be returning <1000 records but what is the size of the table in total?
更新所以您将返回 <1000 条记录,但表的总大小是多少?
UPDATEOh, man, I'm sorry I didn't catch this before. If you want to deploy it correctly on SQL Server, your query should be:
更新哦,伙计,很抱歉我之前没有发现这个。如果你想在 SQL Server 上正确部署它,你的查询应该是:
SELECT A.NAME, B.SYNTAX, B.DESCRIPTION, A.RATE1, A.RATE2, A.RATE3, A.STARTDATE, A.ENDDATE, A.HIDE, A.CATEGORYNAME
FROM Table1 A join Table2 B on (A.Name=B.Name)
WHERE (A.MODULENAME=@ModuleName) AND (A.PREVIOUS<>'N' OR A.PREVIOUS IS NULL)
ORDER BY A.NAME
Try this and I will almost guarantee you'll see a huge speed-up.
试试这个,我几乎可以保证你会看到一个巨大的加速。
回答by Kluge
ORDER BY
is not generally slow, provided that the database can find an index that corresponds with the ORDER BY
expression.
ORDER BY
一般不会很慢,前提是数据库可以找到与ORDER BY
表达式对应的索引。
However, your SQL statement might include other things that force the database to scan the entire table before returning the results, like SELECT TOP n
但是,您的 SQL 语句可能包括强制数据库在返回结果之前扫描整个表的其他内容,例如 SELECT TOP n
回答by Quassnoi
If your filter looks like this:
如果您的过滤器如下所示:
WHERE col1 = @value1
AND col2 = @value2
AND col3 = @value3
ORDER BY
col4
, then you'll need to create an index on (col1, col2, col3, col4)
.
,那么您需要在 上创建索引(col1, col2, col3, col4)
。
The optimizer will use the index both to filter on the first three values and to order by the fourth one.
优化器将使用索引来过滤前三个值并按第四个值排序。
If you don't have such an index, then one of the following will happen:
如果您没有这样的索引,则会发生以下情况之一:
- Optimizer will use an index on to filter on the
WHERE
condition, but it will still have toORDER
remaining rows. - Optimizer will use an index to
ORDER
the values, but ALL values will need to be looked upon to filter them out. - Optimizer will not use an index at all, so both cons from 2 "ALL values will need to be looked upon to filter them out" and 1 "all remaining rows have to be ordered" are true.
- 优化器将使用索引来过滤
WHERE
条件,但它仍然需要ORDER
剩余的行。 - 优化器将使用
ORDER
值的索引,但需要查看所有值以将其过滤掉。 - 优化器根本不会使用索引,因此 2“需要查看所有值以将其过滤掉”和 1“所有剩余的行都必须排序”中的两个缺点都是正确的。
回答by FerranB
UPDATE: As the query you posted, I think the best option is to consider the query as good because:
更新:作为您发布的查询,我认为最好的选择是将查询视为好,因为:
- For few rows, don't care about who do the work. Then the easier for you is use the
ORDER BY
. - For a lot of rows, don't leave the client do the work: The RDMBS it's more specialized and sure the server have more memory and CPU.
- 对于几行,不要关心谁做的工作。然后对您来说更容易使用
ORDER BY
. - 对于很多行,不要让客户端做这项工作:RDMBS 它更专业,确保服务器有更多的内存和 CPU。
The tips for orders you have to consider are:
您必须考虑的订单提示是:
ORDER BY
is the ONLYway to warranty sort on a SQL query.- The bestworkeron sorting is the database in any case: BE SURE ON THIS!
- Try to minimize the cardinality for returned rows.
- Create indexes according the query. It means put the ordered columns last on the index.
- Avoid indexing if the query is fast.
- You can consider that the indexes are sorted, then if you sort for only a table and have good indexes the sort can have cost near zero.
ORDER BY
是对 SQL 查询进行保修排序的唯一方法。- 在最好的工人在分拣在任何情况下,数据库:请务必在此!
- 尝试最小化返回行的基数。
- 根据查询创建索引。这意味着将有序列放在索引的最后。
- 如果查询速度很快,请避免索引。
- 您可以考虑对索引进行排序,然后如果您只对一个表进行排序并且具有良好的索引,则排序的成本可能接近于零。
For more rules of thumb about indexes look for this other SO question.
有关索引的更多经验法则,请查找其他 SO 问题。
回答by dkretz
If you are selecting few enough rows to display, it's not conceivable that the ORDER BY
clause would take any perceptible amount of time unless you are limiting the number of rows returned with LIMIT
or TOP
.
如果您选择的行数足够少来显示,那么ORDER BY
除非您使用LIMIT
或限制返回的行数,否则该子句将花费任何可察觉的时间是不可想象的TOP
。
We need more info. What dbms? What does the query plan look like? Have you looked at query plans with and without ORDER BY
? What differences do you see?
我们需要更多信息。什么数据库?查询计划是什么样的?您是否查看过有和没有的查询计划ORDER BY
?您看到哪些差异?
EDIT:
编辑:
SELECT A.NAME, B.SYNTAX, B.DESCRIPTION, A.RATE1, A.RATE2,
A.RATE3, A.STARTDATE, A.ENDDATE, A.HIDE, A.CATEGORYNAME
FROM A, B
WHERE A.MODULENAME='"+loadedModuleName+"'
AND A.NAME = B.NAME
AND (A.PREVIOUS<>'N' OR A.PREVIOUS IS NULL)
ORDER BY NAME
Is NAME
the primary key
?
Is there an index
on NAME
? By itself, or with other fields? In what sequence?
How many rows are returned for one loadedModuleName?
I suspect slowness comes from "A.PREVIOUS <> 'N' OR A.PREVIOUS IS NULL"
Try using (NOT A.PREVIOUS = 'N')
which I think is equivalent and may help a bit.
Time the query with and without the ORDER BY
and see if the timing is at all different. It shouldn't be.
是NAME
的primary key
?有index
开NAME
吗?单独使用,还是与其他领域一起使用?按什么顺序?
一个loadedModuleName 返回多少行?
我怀疑缓慢来自“A.PREVIOUS <> 'N' OR A.PREVIOUS IS NULL"
尝试使用(NOT A.PREVIOUS = 'N')
我认为是等效的并且可能会有所帮助。
使用和不使用查询时间ORDER BY
并查看时间是否完全不同。它不应该是。
编辑:
If NAME
is not unique in either A
or B
, your join is going to go partially ballistic when every A.NAME
instance becomes cross-joined on B.NAME
. If 50 A rows match and 50 B rows match, you'll end up with 2500 result rows, which may not be what you intend.
如果或NAME
中的任何一个都不是唯一的,则当每个实例在 上交叉连接时,您的连接将部分进行弹道。如果 50 个 A 行匹配且 50 个 B 行匹配,您最终会得到 2500 个结果行,这可能不是您想要的。A
B
A.NAME
B.NAME
回答by JosephStyons
Keep in mind that many query editors will show results after just the first 50 or so have come back from the database.
请记住,许多查询编辑器会在前 50 个左右从数据库返回后显示结果。
Adding an ORDER BY will force it to wait on the database for allresults, which will reveal the real speed of the query.
添加 ORDER BY 将强制它等待数据库的所有结果,这将揭示查询的真实速度。
In those cases, the original query and the ORDERed one are the same speed; you were just fooled into thinking the first one was fast, because your editor was quick to get the top 50 or so rows.
在这些情况下,原始查询和 ORDERed 查询的速度相同;您只是误以为第一个很快,因为您的编辑器很快就获得了前 50 行左右。
回答by Marc Gravell
ORDER BY
is not particularly slow, especially if there is an index on that column. In particular, if you have a clusteredindex on that column, the data is already sorted.
ORDER BY
不是特别慢,特别是如果该列上有索引。特别是,如果该列上有聚集索引,则数据已经排序。
You can also use paging (TOP
or ROW_NUMBER
) etc to help.
您还可以使用分页(TOP
或ROW_NUMBER
)等来提供帮助。
回答by Konstantin Tarkus
It shouldn't be slow. Optimize your query and database structure (at least indexes and statistcs if it's SQL Server). Maybe there is some other thing in your query other than ORDER BY
which causes this slowness?
应该不会慢。优化您的查询和数据库结构(如果是 SQL Server,至少是索引和统计数据)。除了ORDER BY
导致这种缓慢的原因之外,您的查询中也许还有其他一些事情?
SELECT A.NAME, B.SYNTAX, B.DESCRIPTION, A.RATE1, A.RATE2, A.RATE3,
A.STARTDATE, A.ENDDATE, A.HIDE, A.CATEGORYNAME
FROM Table1 A JOIN Table2 B on A.Name = B.Name
WHERE A.MODULENAME = @ModuleName AND A.PREVIOUS<>'N' OR A.PREVIOUS IS NULL
ORDER BY A.NAME
Option 1
选项1
If you're quering just a few simple columns (2-4), you can include them into the index as well. This way your query will be ran faster. Also make sure that sorting order on that index column matches sorting order in your query.
如果您只查询几个简单的列 (2-4),您也可以将它们包含在索引中。这样您的查询将运行得更快。还要确保该索引列的排序顺序与查询中的排序顺序相匹配。
// if your query looks like this:
SELECT [Name], [Title], [Count] ORDER BY [COUNT]
// you can create an index on [Name], [Title], [Count]
Option 3
选项 3
Create a view
and bind it to the schema
. Then query data from that view
.
创建一个view
并将其绑定到schema
. 然后从中查询数据view
。
Option 3
选项 3
If you use SQL Server 2005
and obove, you can also try to run you query in SQL Server Profilerand it will recommend to you the best index and statistics which you can apply to your table in order to optimize this particular query's performance.
如果您使用SQL Server 2005
和以上,您还可以尝试在SQL Server Profiler 中运行您的查询,它会向您推荐最好的索引和统计信息,您可以将它们应用于您的表以优化此特定查询的性能。
Option 4
选项 4
Try to rebuild your indexes and statistics.
尝试重建您的索引和统计信息。
Option 5
选项 5
You can try putting you index/table into separate filegroup on different hard drive.
您可以尝试将索引/表放入不同硬盘驱动器上的单独文件组中。
回答by Adam Robinson
It's not a fair statement to say that "order by" is slow in and of itself. You have many RDBM's to consider as far as their own implementation, and the data type and indexing scheme. I would, however, doubt that you can sort it faster client-side than you can on the server, but that isn't to say that sorting it on the server is the right thing to do.
说“order by”本身很慢是不公平的。就它们自己的实现以及数据类型和索引方案而言,您有许多 RDBM 需要考虑。但是,我怀疑您在客户端对其进行排序是否比在服务器上排序更快,但这并不是说在服务器上对其进行排序是正确的做法。