performance 可以应用哪些通用技术来优化 SQL 查询?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39331/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 16:37:32  来源:igfitidea点击:

What generic techniques can be applied to optimize SQL queries?

sqlperformance

提问by Niyaz

What techniques can be applied effectively to improve the performance of SQL queries? Are there any general rules that apply?

可以有效地应用哪些技术来提高 SQL 查询的性能?是否有任何适用的一般规则?

回答by Unsliced

  • Use primary keys
  • Avoid select *
  • Be as specific as you can when building your conditional statements
  • De-normalisation can often be more efficient
  • Table variables and temporary tables (where available) will often be better than using a large source table
  • Partitioned views
  • Employ indices and constraints
  • 使用主键
  • 避免选择 *
  • 在构建条件语句时尽可能具体
  • 去规范化通常更有效
  • 表变量和临时表(如果可用)通常比使用大型源表更好
  • 分区视图
  • 使用指标和约束

回答by Keith

Learn what's really going on under the hood - you should be able to understand the following concepts in detail:

了解幕后真正发生的事情 - 您应该能够详细了解以下概念:

  • Indexes (not just what they are but actually how they work).
  • Clustered indexes vs heap allocated tables.
  • Text and binary lookups and when they can be in-lined.
  • Fill factor.
  • How records are ghosted for update/delete.
  • When page splits happen and why.
  • Statistics, and how they effect various query speeds.
  • The query planner, and how it works for your specific database (for instance on some systems "select *" is slow, on modern MS-Sql DBs the planner can handle it).
  • 索引(不仅仅是它们是什么,而是它们实际上是如何工作的)。
  • 聚集索引与堆分配表。
  • 文本和二进制查找以及何时可以内联。
  • 填充因子
  • 如何为更新/删除对记录进行幻影。
  • 何时发生页面拆分以及原因。
  • 统计信息,以及它们如何影响各种查询速度。
  • 查询计划器,以及它如何为您的特定数据库工作(例如,在某些系统上,“select *”很慢,在现代 MS-Sql DB 上,计划器可以处理它)。

回答by Danimal

The biggest thing you can do is to look for table scans in sql server query analyzer (make sure you turn on "show execution plan"). Otherwise there are a myriad of articles at MSDN and elsewhere that will give good advice.

您可以做的最重要的事情是在 sql server 查询分析器中查找表扫描(确保打开“显示执行计划”)。否则,MSDN 和其他地方有大量文章可以提供很好的建议。

As an aside, when I started learning to optimize queries I ran sql server query profiler against a trace, looked at the generated SQL, and tried to figure out why that was an improvement. Query profiler is far from optimal, but it's a decent start.

顺便说一句,当我开始学习优化查询时,我针对跟踪运行了 sql server 查询分析器,查看了生成的 SQL,并试图找出为什么这是一个改进。查询分析器远非最佳,但它是一个不错的开始。

回答by Leo Moore

There are a couple of things you can look at to optimize your query performance.

您可以查看几件事来优化查询性能。

  1. Ensure that you just have the minimum of data. Make sure you select only the columns you need. Reduce field sizes to a minimum.

  2. Consider de-normalising your database to reduce joins

  3. Avoid loops (i.e. fetch cursors), stick to set operations.

  4. Implement the query as a stored procedure as this is pre-compiled and will execute faster.

  5. Make sure that you have the correct indexes set up. If your database is used mostly for searching then consider more indexes.

  6. Use the execution plan to see how the processing is done. What you want to avoid is a table scan as this is costly.

  7. Make sure that the Auto Statistics is set to on. SQL needs this to help decide the optimal execution. See Mike Gunderloy's great post for more info. Basics of Statistics in SQL Server 2005

  8. Make sure your indexes are not fragmented. Reducing SQL Server Index Fragmentation

  9. Make sure your tables are not fragmented. How to Detect Table Fragmentation in SQL Server 2000 and 2005
  1. 确保您只有最少的数据。确保只选择您需要的列。将字段大小降至最低。

  2. 考虑对数据库进行反规范化以减少连接

  3. 避免循环(即获取游标),坚持设置操作。

  4. 将查询实现为存储过程,因为这是预编译的并且执行速度更快。

  5. 确保您设置了正确的索引。如果您的数据库主要用于搜索,请考虑使用更多索引。

  6. 使用执行计划来查看处理是如何完成的。您想要避免的是表扫描,因为这很昂贵。

  7. 确保自动统计设置为开。SQL 需要这个来帮助决定最佳执行。有关更多信息,请参阅 Mike Gunderloy 的精彩帖子。SQL Server 2005 中的统计基础

  8. 确保您的索引没有碎片化。减少 SQL Server 索引碎片

  9. 确保您的表没有碎片。如何检测 SQL Server 2000 和 2005 中的表碎片

回答by EvilTeach

Use a withstatment to handle query filtering. Limit each subquery to the minimum number of rows possible. then join the subqueries.

使用with语句来处理查询过滤。将每个子查询限制为尽可能少的行数。然后加入子查询。

WITH
master AS
(
    SELECT SSN, FIRST_NAME, LAST_NAME
    FROM MASTER_SSN
    WHERE STATE = 'PA' AND
          GENDER = 'M'
),
taxReturns AS
(
    SELECT SSN, RETURN_ID, GROSS_PAY
    FROM MASTER_RETURNS
    WHERE YEAR < 2003 AND
          YEAR > 2000
)
SELECT *
FROM master,
     taxReturns
WHERE master.ssn = taxReturns.ssn

A subqueries within a with statement may end up as being the same as inline views, or automatically generated temp tables. I find in the work I do, retail data, that about 70-80% of the time, there is a performance benefit.

with 语句中的子查询可能最终与内联视图或自动生成的临时表相同。我发现在我所做的工作中,零售数据,大约 70-80% 的时间有性能优势。

100% of the time, there is a maintenance benefit.

100% 的情况下,有维护优势。

回答by HLGEM

Some other points (Mine are based on SQL server, since each db backend has it's own implementations they may or may not hold true for all databases):

其他一些要点(我的基于 SQL 服务器,因为每个数据库后端都有自己的实现,它们可能适用于所有数据库,也可能不适用于所有数据库):

Avoid correlated subqueries in the select part of a statement, they are essentially cursors.

避免在语句的选择部分中出现相关子查询,它们本质上是游标。

Design your tables to use the correct datatypes to avoid having to apply functions on them to get the data out. It is far harder to do date math when you store your data as varchar for instance.

设计您的表以使用正确的数据类型,以避免必须对其应用函数来获取数据。例如,当您将数据存储为 varchar 时,进行日期数学运算要困难得多。

If you find that you are frequently doing joins that have functions in them, then you need to think about redesigning your tables.

如果您发现您经常进行带有函数的连接,那么您需要考虑重新设计您的表。

If your WHERE or JOIN conditions include OR statements (which are slower) you may get better speed using a UNION statement.

如果您的 WHERE 或 JOIN 条件包括 OR 语句(速度较慢),则使用 UNION 语句可能会获得更好的速度。

UNION ALL is faster than UNION if (And only if) the two statments are mutually exclusive and return the same results either way.

UNION ALL 比 UNION 更快,如果(且仅当)这两个语句是互斥的并且以任何一种方式返回相同的结果。

NOT EXISTS is usually faster than NOT IN or using a left join with a WHERE clause of ID = null

NOT EXISTS 通常比 NOT IN 或使用带有 ID = null 的 WHERE 子句的左连接更快

In an UPDATE query add a WHERE condition to make sure you are not updating values that are already equal. The difference between updating 10,000,000 records and 4 can be quite significant!

在 UPDATE 查询中添加 WHERE 条件以确保您没有更新已经相等的值。更新 10,000,000 条记录和 4 条记录之间的差异可能非常大!

Consider pre-calculating some values if you will be querying them frequently or for large reports. A sum of the values in an order only needs to be done when the order is made or adjusted, rather than when you are summarizing the results of 10,000,000 million orders in a report. Pre-calculations should be done in triggers so that they are always up-to-date is the underlying data changes. And it doesn't have to be just numbers either, we havea calculated field that concatenates names that we use in reports.

如果您将频繁查询或查询大型报告,请考虑预先计算一些值。一个订单中的值的总和只需要在创建或调整订单时进行,而不需要在报表中汇总 10,000,000 百万个订单的结果时进行。应在触发器中进行预计算,以便它们始终是最新的基础数据更改。而且它也不必只是数字,我们有一个计算字段,用于连接我们在报告中使用的名称。

Be wary of scalar UDFs, they can be slower than putting the code in line.

警惕标量 UDF,它们可能比将代码放在一行中更慢。

Temp table tend to be faster for large data set and table variables faster for small ones. In addition you can index temp tables.

大数据集的临时表往往更快,小数据集的表变量往往更快。此外,您可以索引临时表。

Formatting is usually faster in the user interface than in SQL.

在用户界面中格式化通常比在 SQL 中更快。

Do not return more data than you actually need.

不要返回比实际需要更多的数据。

This one seems obvious but you would not believe how often I end up fixing this. Do not join to tables that you are not using to filter the records or actually calling one of the fields in the select part of the statement. Unnecessary joins can be very expensive.

这个看起来很明显,但你不会相信我最终解决这个问题的频率。不要连接到您不用于过滤记录或实际调用语句的 select 部分中的字段之一的表。不必要的连接可能非常昂贵。

It is an very bad idea to create views that call other views that call other views. You may find you are joining to the same table 6 times when you only need to once and creating 100,000,00 records in an underlying view in order to get the 6 that are in your final result.

创建调用其他视图的视图是一个非常糟糕的主意。您可能会发现您加入同一个表 6 次,而您只需要一次并在基础视图中创建 100,000,00 条记录即可获得最终结果中的 6 条记录。

In designing a database, think about reporting not just the user interface to enter data. Data is useless if it is not used, so think about how it will be used after it is in the database and how that data will be maintained or audited. That will often change the design. (This is one reason why it is a poor idea to let an ORM design your tables, it is only thinking about one use case for the data.) The most complex queries affecting the most data are in reporting, so designing changes to help reporting can speed up queries (and simplify them) considerably.

在设计数据库时,不仅要考虑报告输入数据的用户界面。不使用的数据是无用的,所以要考虑它在数据库中后如何使用以及如何维护或审计这些数据。这通常会改变设计。(这就是为什么让 ORM 设计你的表是一个糟糕的主意的一个原因,它只考虑数据的一个用例。)影响最多数据的最复杂的查询在报告中,因此设计更改以帮助报告可以大大加快查询(并简化它们)。

Database-specific implementations of features can be faster than using standard SQL (That's one of the ways they sell their product), so get to know your database features and find out which are faster.

特定于数据库的功能实现可能比使用标准 SQL 更快(这是他们销售产品的方式之一),因此请了解您的数据库功能并找出哪些更快。

And because it can't be said too often, use indexes correctly, not too many or too few. And make your WHERE clauses sargable (Able to use indexes).

而且因为不能说得太频繁,所以要正确使用索引,不要太多也不要太少。并使您的 WHERE 子句可调整(能够使用索引)。

回答by aku

I think using SQL query analyzer would be a good start.

我认为使用 SQL 查询分析器将是一个好的开始。

回答by hamishmcn

In Oracle you can look at the explain planto compare variations on your query

在 Oracle 中,您可以查看解释计划以比较查询的变化

回答by megabytephreak

Make sure that you have the right indexes on the table. if you frequently use a column as a way to order or limit your dataset an index can make a big difference. I saw in a recent article that select distinct can really slow down a query, especially if you have no index.

确保表上有正确的索引。如果您经常使用列作为排序或限制数据集的方式,则索引可能会产生很大的不同。我在最近的一篇文章中看到 select distinct 确实会减慢查询速度,尤其是在没有索引的情况下。

回答by Ubiguchi

The obvious optimization for SELECT queries is ensuring you have indexes on columns used for joins or in WHERE clauses.

SELECT 查询的明显优化是确保您在用于连接或 WHERE 子句的列上有索引。

Since adding indexes can slow down data writes you do need to monitor performance to ensure you don't kill the DB's write performance, but that's where using a good query analysis tool can help you balanace things accordingly.

由于添加索引会减慢数据写入速度,因此您确实需要监控性能以确保不会降低数据库的写入性能,但这就是使用良好的查询分析工具可以帮助您相应地平衡问题的地方。