优化 SQL Query 避免 Hash Match (Aggregate)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24824206/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:14:48  来源:igfitidea点击:

Optimize SQL Query to avoid Hash Match (Aggregate)

sqlsql-serverperformance

提问by gregs

I have a SQL query that takes 7 minutes+ to return results. I'm trying to optimize as much as possible and the Execution plan loses 82% of the time on a Hash Match (Aggregate). I've done some searching and looks like using an "EXISTS" would help to resolve, but I haven't figured out the syntax of the query to make it work. Here's the query:

我有一个 SQL 查询需要 7 分钟以上才能返回结果。我正在尝试尽可能多地优化,并且执行计划在哈希匹配(聚合)上丢失了 82% 的时间。我已经进行了一些搜索,看起来使用“EXISTS”有助于解决问题,但我还没有弄清楚查询的语法以使其工作。这是查询:

select dbo.Server.Name,
                dbo.DiskSpace.Drive,
                AVG(dbo.DiskSpace.FreeSpace) as 'Free Disk Space',
                AVG(dbo.Processor.PercentUsed) as 'CPU % Used',
                AVG(dbo.Memory.PercentUtilized) as '% Mem Used'

                from Server
                join dbo.DiskSpace on dbo.Server.ID=DiskSpace.ServerID
                join dbo.Processor on dbo.Server.ID=Processor.ServerID
                join dbo.Memory on dbo.Server.ID=dbo.Memory.ServerID

                where
                dbo.Processor.ProcessorNum='_Total' 
                    and dbo.Processor.Datetm>DATEADD(DAY,-(1),(CONVERT (date, GETDATE()))) 
                    and (      dbo.Server.Name='qp-ratking' 
                            or dbo.Server.Name='qp-hyper2012' 
                            or dbo.Server.Name='qp-hyped' 
                            or dbo.Server.Name='qp-lichking')
                Group By dbo.server.name, Dbo.DiskSpace.Drive
                Order By Dbo.Server.Name, dbo.DiskSpace.Drive;

How do I reduce/eliminate the joins using EXISTS? Or if there is a better way to optimize, I'm up for that too. Thanks

如何使用 EXISTS 减少/消除连接?或者,如果有更好的优化方法,我也愿意。谢谢

回答by gregs

A co-worker broke down the query and pulled out data in smaller chunks so there wasn't as much processing of the data returned by the joins. It cut it down to less than 1 second return. New Query:

一位同事分解了查询并以较小的块提取数据,因此没有对连接返回的数据进行那么多处理。它将其减少到不到 1 秒的回报。新查询:

WITH tempDiskSpace AS
(
SELECT dbo.Server.Name
      ,dbo.DiskSpace.Drive
      ,AVG(dbo.DiskSpace.FreeSpace) AS 'Free Disk Space'

FROM dbo.DiskSpace
      LEFT JOIN dbo.Server ON dbo.DiskSpace.ServerID=Server.ID

WHERE dbo.DiskSpace.Datetm>DATEADD(DAY,-(1),(CONVERT (date, GETDATE())))
AND (dbo.Server.Name='qp-ratking'
      OR dbo.Server.Name='qp-hyper2012'
      OR dbo.Server.Name='qp-hyped'
      OR dbo.Server.Name='qp-lichking')

GROUP BY Name, Drive
)
,tempProcessor
AS
(
SELECT dbo.Server.Name
      ,AVG(dbo.Processor.PercentUsed) AS 'CPU % Used'

FROM dbo.Processor
      LEFT JOIN dbo.Server ON dbo.Processor.ServerID=Server.ID

WHERE dbo.Processor.Datetm>DATEADD(DAY,-(1),(CONVERT (date, GETDATE())))
AND dbo.Processor.ProcessorNum='_Total'
AND (dbo.Server.Name='qp-ratking'
      OR dbo.Server.Name='qp-hyper2012'
      OR dbo.Server.Name='qp-hyped'
      OR dbo.Server.Name='qp-lichking')

GROUP BY Name
)
,tempMemory
AS
(
SELECT dbo.Server.Name
      ,AVG(dbo.Memory.PercentUtilized) as '% Mem Used'

FROM dbo.Memory
      LEFT JOIN dbo.Server ON dbo.Memory.ServerID=Server.ID

WHERE dbo.Memory.Datetm>DATEADD(DAY,-(1),(CONVERT (date, GETDATE())))
AND (dbo.Server.Name='qp-ratking'
      OR dbo.Server.Name='qp-hyper2012'
      OR dbo.Server.Name='qp-hyped'
      OR dbo.Server.Name='qp-lichking')

GROUP BY Name
)

SELECT tempDiskSpace.Name, tempDiskSpace.Drive, tempDiskSpace.[Free Disk Space], tempProcessor.[CPU % Used], tempMemory.[% Mem Used]
FROM tempDiskSpace
LEFT JOIN tempProcessor ON tempDiskSpace.Name=tempProcessor.Name
LEFT JOIN tempMemory ON tempDiskSpace.Name=tempMemory.Name
ORDER BY Name, Drive;

Thanks for all the suggestions.

感谢所有的建议。

回答by Gordon Linoff

I would start by checking the indexes. Are all the keys used in the join defined as primary keys? Or do they at least have indexes?

我会从检查索引开始。连接中使用的所有键是否都定义为primary keys?或者他们至少有索引?

Then, additional indexes on Processorand Servermight help:

然后,在额外的索引ProcessorServer可能有帮助:

create index idx_Processor_ProcessorNum_Datetm_ServerId on ProcessorNum(ProcessorNum, Datetm, ServerId);
create index idx_Server_Name_ServerId on Server(Name, ServerId)

回答by Gouri Shankar Aechoor

The statement looks reasonably structured and do not see a huge scope for optimization provided the per-requisits are addressed such as

该语句看起来结构合理,并没有看到巨大的优化范围,前提是解决了每个需求,例如

  1. Check Index Fragmentation and ensure all Indexes are maintained
  2. Check if Statistics are up to date
  3. If dirty ready are acceptable then worth consider applying WITH (NOLOCK) on the tables.
  4. If the query allows declaring variables then moving the DATEADD out of the Filter statement as below can be beneficial.
  1. 检查索引碎片并确保维护所有索引
  2. 检查统计信息是否是最新的
  3. 如果可以接受脏准备,那么值得考虑在表上应用 WITH (NOLOCK)。
  4. 如果查询允许声明变量,那么将 DATEADD 移出过滤器语句,如下所示可能是有益的。

Hope this helps.

希望这可以帮助。

-- Assuming Variables can be declared see the script below. 
-- I made a few changes per my coding standard only to help me read better.

DECLARE @dt_Yesterdate DATE

SET @dt_Yesterdate = DATEADD(DAY, -(1), CONVERT (DATE, GETDATE()))

声明 @dt_Yesterdate 日期

SET @dt_Yesterdate = DATEADD(DAY, -(1), CONVERT (DATE, GETDATE()))

SELECT s.Name,
       ds.Drive,
       AVG(ds.FreeSpace) AS 'Free Disk Space',
       AVG(P.PercentUsed) AS 'CPU % Used',
       AVG(m.PercentUtilized) AS '% Mem Used'
FROM Server s
     JOIN dbo.DiskSpace AS ds
         ON s.ID = ds.ServerID
     JOIN dbo.Processor AS p
         ON s.ID = p.ServerID
     JOIN dbo.Memory AS m
         ON s.ID = m.ServerID
WHERE P.ProcessorNum = '_Total'
  AND P.Datetm > @dt_Yesterdate
  AND s.Name IN ('qp-ratking', 'qp-hyper2012', 'qp-hyped','qp-lichking')
GROUP BY s.name, ds.Drive
ORDER BY s.Name, ds.Drive;

回答by Nick

At the very least I'd start with getting rid of all those OR clauses.

至少我会从摆脱所有这些 OR 子句开始。

AND (dbo.Server.Name='qp-ratking'
      OR dbo.Server.Name='qp-hyper2012'
      OR dbo.Server.Name='qp-hyped'
      OR dbo.Server.Name='qp-lichking')

and replace with

并替换为

AND dbo.Server.Name in ('qp-ratking','qp-hyper2012','qp-hyped','qp-lichking')  

I'm not sure about converting everything to CTEs though. You can't index CTEs and I'm yet to come across an occasion where CTEs outperform a regular query. Your initial query seemed well formed apart from the over use of OR as mentioned above, so I'd be looking at indexes next.

不过,我不确定是否将所有内容都转换为 CTE。你不能索引 CTE,我还没有遇到过 CTE 优于常规查询的情况。除了上面提到的过度使用 OR 之外,您的初始查询似乎格式良好,因此接下来我将查看索引。