为什么 SQL Server 标量值函数变慢?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/800017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why do SQL Server Scalar-valued functions get slower?
提问by DBAndrew
Why do Scalar-valued functions seem to cause queries to run cumulatively slower the more times in succession that they are used?
为什么标量值函数似乎会导致查询在连续使用的次数越多时累积运行越慢?
I have this table that was built with data purchased from a 3rd party.
我有这张表,它是用从 3rd 方购买的数据构建的。
I've trimmed out some stuff to make this post shorter... but just so you get the idea of how things are setup.
我删减了一些东西,使这篇文章更短......但只是为了让你了解事情是如何设置的。
CREATE TABLE [dbo].[GIS_Location](
[ID] [int] IDENTITY(1,1) NOT NULL, --PK
[Lat] [int] NOT NULL,
[Lon] [int] NOT NULL,
[Postal_Code] [varchar](7) NOT NULL,
[State] [char](2) NOT NULL,
[City] [varchar](30) NOT NULL,
[Country] [char](3) NOT NULL,
CREATE TABLE [dbo].[Address_Location](
[ID] [int] IDENTITY(1,1) NOT NULL, --PK
[Address_Type_ID] [int] NULL,
[Location] [varchar](100) NOT NULL,
[State] [char](2) NOT NULL,
[City] [varchar](30) NOT NULL,
[Postal_Code] [varchar](10) NOT NULL,
[Postal_Extension] [varchar](10) NULL,
[Country_Code] [varchar](10) NULL,
Then I have two functions that look up LAT and LON.
然后我有两个函数来查找 LAT 和 LON。
CREATE FUNCTION [dbo].[usf_GIS_GET_LAT]
(
@City VARCHAR(30),
@State CHAR(2)
)
RETURNS INT
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE @LAT INT
SET @LAT = (SELECT TOP 1 LAT FROM GIS_Location WITH(NOLOCK) WHERE [State] = @State AND [City] = @City)
RETURN @LAT
END
CREATE FUNCTION [dbo].[usf_GIS_GET_LON]
(
@City VARCHAR(30),
@State CHAR(2)
)
RETURNS INT
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE @LON INT
SET @LON = (SELECT TOP 1 LON FROM GIS_Location WITH(NOLOCK) WHERE [State] = @State AND [City] = @City)
RETURN @LON
END
When I run the following...
当我运行以下...
SET STATISTICS TIME ON
SELECT
dbo.usf_GIS_GET_LAT(City,[State]) AS Lat,
dbo.usf_GIS_GET_LON(City,[State]) AS Lon
FROM
Address_Location WITH(NOLOCK)
WHERE
ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)
SET STATISTICS TIME OFF
100 ~= 8 ms, 200 ~= 32 ms, 400 ~= 876 ms
100 ~= 8 毫秒,200 ~= 32 毫秒,400 ~= 876 毫秒
--Edit Sorry I should have been more clear. I'm not looking to tune the query listed above. This is just a sample to show the execution time getting slower the more records it crunches through. In the real world application the functions are used as part of a where clause to build a radius around a city and state to include all records with in that region.
--编辑对不起,我应该更清楚。我不打算调整上面列出的查询。这只是一个示例,显示执行时间越慢,它处理的记录越多。在现实世界的应用程序中,这些函数用作 where 子句的一部分,以围绕城市和州建立半径,以包含该地区的所有记录。
回答by John Gibb
In most cases, it's best to avoid scalar valued functions that reference tables because (as others said) they are basically black boxes that need to be ran once for every row, and cannot be optimized by the query plan engine. Therefore, they tend to scale linearly even if the associated tables have indexes.
在大多数情况下,最好避免引用表的标量值函数,因为(正如其他人所说)它们基本上是需要为每一行运行一次的黑盒子,并且不能被查询计划引擎优化。因此,即使关联的表有索引,它们也倾向于线性扩展。
You may want to consider using an inline-table-valued function, since they are evaluated inline with the query, and can be optimized. You get the encapsulation you want, but the performance of pasting the expressions right in the select statement.
您可能需要考虑使用内联表值函数,因为它们是与查询内联计算的,并且可以进行优化。你得到了你想要的封装,但是在 select 语句中粘贴表达式的性能。
As a side effect of being inlined, they can't contain any procedural code (no declare @variable; set @variable = ..; return). However, they can return several rows and columns.
作为内联的副作用,它们不能包含任何过程代码(没有声明@variable;设置@variable = ..;返回)。但是,它们可以返回多行和多列。
You could re-write your functions something like this:
你可以像这样重写你的函数:
create function usf_GIS_GET_LAT(
@City varchar (30),
@State char (2)
)
returns table
as return (
select top 1 lat
from GIS_Location with (nolock)
where [State] = @State
and [City] = @City
);
GO
create function usf_GIS_GET_LON (
@City varchar (30),
@State char (2)
)
returns table
as return (
select top 1 LON
from GIS_Location with (nolock)
where [State] = @State
and [City] = @City
);
The syntax to use them is also a little different:
使用它们的语法也有点不同:
select
Lat.Lat,
Lon.Lon
from
Address_Location with (nolock)
cross apply dbo.usf_GIS_GET_LAT(City,[State]) AS Lat
cross apply dbo.usf_GIS_GET_LON(City,[State]) AS Lon
WHERE
ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)
回答by Sam Saffron
They do not.
他们不。
There is no bug in scalar functions that causes its performance to degrade exponentially depending on the number of rows in the scalar function is executed against. Try your tests again and have a look at SQL profiler, looking at the CPU and READS and DURATION columns. Increase you test size to include tests that take longer than a second, two seconds, five seconds.
标量函数中不存在导致其性能根据执行的标量函数中的行数呈指数级下降的错误。再次尝试您的测试并查看 SQL 分析器,查看 CPU 和 READS 和 DURATION 列。增加您的测试规模以包括需要超过一秒、两秒、五秒的测试。
CREATE FUNCTION dbo.slow
(
@ignore int
)
RETURNS INT
AS
BEGIN
DECLARE @slow INT
SET @slow = (select count(*) from sysobjects a
cross join sysobjects b
cross join sysobjects c
cross join sysobjects d
cross join sysobjects e
cross join sysobjects f
where a.id = @ignore)
RETURN @slow
END
go
SET STATISTICS TIME ON
select top 1 dbo.slow(id)
from sysobjects
go
select top 5 dbo.slow(id)
from sysobjects
go
select top 10 dbo.slow(id)
from sysobjects
go
select top 20 dbo.slow(id)
from sysobjects
go
select top 40 dbo.slow(id)
from sysobjects
SET STATISTICS TIME OFF
Output
输出
SQL Server Execution Times:
CPU time = 203 ms, elapsed time = 202 ms.
SQL Server Execution Times:
CPU time = 889 ms, elapsed time = 939 ms.
SQL Server Execution Times:
CPU time = 1748 ms, elapsed time = 1855 ms.
SQL Server Execution Times:
CPU time = 3541 ms, elapsed time = 3696 ms.
SQL Server Execution Times:
CPU time = 7207 ms, elapsed time = 7392 ms.
Keep in mind that if you are running a scalar function against rows in the result set, the scalar function will be executed per-row with no global optimisation.
请记住,如果您对结果集中的行运行标量函数,则标量函数将在没有全局优化的情况下按行执行。
回答by A-K
You can wrap your functionality in an inline TVF, that will be much faster:
您可以将您的功能包装在内联 TVF 中,这样会更快:
回答by KM.
you call the function two times (two select hits to the DB) for every row in the result set.
您为结果集中的每一行调用该函数两次(两次选择命中数据库)。
to make your query faster join right to GIS_Location and skip the functions:
使您的查询更快地连接到 GIS_Location 并跳过功能:
SELECT
g.Lat,
g.Lon
FROM
Address_Location l WITH(NOLOCK)
INNER JOIN GIS_Location g WITH(NOLOCK) WHERE l.State = g.State AND l.City = g.City
WHERE
ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)
I'm not sure why the NOLOCK, or the crazy where clause, I just copied from the question...
我不确定为什么 NOLOCK 或疯狂的 where 子句,我只是从问题中复制...
回答by dkretz
Simply put, because SQL expressions with user defined functions are less efficient than SQL expressions without them. The execution logic can't be optimized; and the function overhead (including calling protocols) must be incurred for every row.
简单地说,因为带有用户定义函数的 SQL 表达式比没有它们的 SQL 表达式效率低。执行逻辑无法优化;并且每一行都必须产生函数开销(包括调用协议)。
KMike's advice is good. WHERE .. IN (SELECT something) is not likely to be an efficient pattern, and in this case can be easily replaced with a JOIN.
KMike 的建议很好。WHERE .. IN (SELECT something) 不太可能是一种有效的模式,在这种情况下可以很容易地用 JOIN 替换。
回答by Gordon Bell
See if this works better... Or maybe a distinct inner join?
看看这是否更有效......或者可能是一个独特的内部连接?
select a.*,
(select top 1 g.Lat from GIS_Location g where g.City = a.City and g.State = a.State) as Lat,
(select top 1 g.Lon from GIS_Location g where g.City = a.City and g.State = a.State) as Lon
from Address_Location a
where a.ID in (select top 100 ID from Address_Location order by ID desc)
As for the scalar function performance, I'm not sure.
至于标量函数的性能,我不确定。
回答by Lukasz Szozda
Typically scalar functions are much slower than inline TVF counterparts. Fortunately for many scenarios it will change.
通常标量函数比内联 TVF 函数慢得多。幸运的是,在许多情况下它会改变。
SQL Server 2019 will introduce Scalar UDF Inlining:
SQL Server 2019 将引入 标量 UDF 内联:
A feature under the intelligent query processing suite of features. This feature improves the performance of queries that invoke scalar UDFs in SQL Server (starting with SQL Server 2019 preview)
T-SQL Scalar User-Defined Functions
User-Defined Functions that are implemented in Transact-SQL and return a single data value are referred to as T-SQL Scalar User-Defined Functions. T-SQL UDFs are an elegant way to achieve code reuse and modularity across SQL queries. Some computations (such as complex business rules) are easier to express in imperative UDF form. UDFs help in building up complex logic without requiring expertise in writing complex SQL queries.
Scalar UDFs typically end up performing poorly due to the following reasons.
- Iterative invocation
- Lack of costing
- Interpreted execution
- Serial execution
Automatic Inlining of Scalar UDFs
The goal of the Scalar UDF inlining feature is to improve performance of queries that invoke T-SQL scalar UDFs, where UDF execution is the main bottleneck.
With this new feature, scalar UDFs are automatically transformed into scalar expressions or scalar subqueries that are substituted in the calling query in place of the UDF operator. These expressions and subqueries are then optimized. As a result, the query plan will no longer have a user-defined function operator, but its effects will be observed in the plan, like views or inline TVFs.
Inlineable Scalar UDFs requirements
A scalar T-SQL UDF can be inline if all of the following conditions are true:
The UDF is written using the following constructs:
- DECLARE, SET: Variable declaration and assignments.
- SELECT: SQL query with single/multiple variable assignments1.
- IF/ELSE: Branching with arbitrary levels of nesting.
- RETURN: Single or multiple return statements.
- UDF: Nested/recursive function calls2.
- Others: Relational operations such as EXISTS, ISNULL.
The UDF does not invoke any intrinsic function that is either time-dependent (such as GETDATE()) or has side effects3 (such as NEWSEQUENTIALID()).
- The UDF uses the EXECUTE AS CALLER clause (the default behavior if the EXECUTE AS clause is not specified).
- The UDF does not reference table variables or table-valued parameters.
- The query invoking a scalar UDF does not reference a scalar UDF call in its GROUP BY clause.
- The UDF is not natively compiled (interop is supported).
- The UDF is not used in a computed column or a check constraint definition.
- The UDF does not reference user-defined types.
- There are no signatures added to the UDF.
- The UDF is not a partition function.
智能查询处理功能套件下的一个功能。此功能提高了在 SQL Server 中调用标量 UDF 的查询的性能(从 SQL Server 2019 预览版开始)
T-SQL 标量用户定义函数
在 Transact-SQL 中实现并返回单个数据值的用户定义函数称为 T-SQL 标量用户定义函数。T-SQL UDF 是实现跨 SQL 查询的代码重用和模块化的一种优雅方式。某些计算(例如复杂的业务规则)更容易以命令式 UDF 形式表达。UDF 有助于构建复杂的逻辑,而无需编写复杂 SQL 查询的专业知识。
由于以下原因,标量 UDF 通常最终表现不佳。
- 迭代调用
- 缺乏成本核算
- 解释执行
- 串行执行
标量 UDF 的自动内联
标量 UDF 内联功能的目标是提高调用 T-SQL 标量 UDF 的查询的性能,其中 UDF 执行是主要瓶颈。
有了这个新功能,标量 UDF 会自动转换为标量表达式或标量子查询,它们在调用查询中代替 UDF 运算符。然后优化这些表达式和子查询。因此,查询计划将不再具有用户定义的函数运算符,但会在计划中观察其效果,如视图或内联 TVF。
可内联标量 UDF 要求
如果以下所有条件都为真,则标量 T-SQL UDF 可以内联:
UDF 是使用以下结构编写的:
- DECLARE、SET:变量声明和赋值。
- SELECT:具有单/多变量赋值的 SQL 查询 1。
- IF/ELSE:使用任意级别的嵌套进行分支。
- RETURN:单个或多个返回语句。
- UDF:嵌套/递归函数调用2。
- 其他:关系操作,如 EXISTS、ISNULL。
UDF 不会调用任何依赖于时间的内部函数(例如 GETDATE())或具有副作用 3(例如 NEWSEQUENTIALID())。
- UDF 使用 EXECUTE AS CALLER 子句(如果未指定 EXECUTE AS 子句,则为默认行为)。
- UDF 不引用表变量或表值参数。
- 调用标量 UDF 的查询未在其 GROUP BY 子句中引用标量 UDF 调用。
- UDF 不是本机编译的(支持互操作)。
- UDF 不用于计算列或检查约束定义。
- UDF 不引用用户定义的类型。
- UDF 中没有添加任何签名。
- UDF 不是分区函数。
Checking if function is inlinable:
检查函数是否可内联:
SELECT OBJECT_NAME([object_id]) AS name, is_inlineable
FROM sys.sql_modules
WHERE [object_id] = OBJECT_ID('schema.function_name')
Enabling/disabling feature on database level:
在数据库级别启用/禁用功能:
ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON/OFF;
Addendum
附录
回答by JuanC.Aguirre
Sorry for me joining late this party, but I want to share my answer for future Profiler Victims. A days ago, all the scalar functions in one production server (sql server 2012 sp4 enterprise) got slower, some stored procedures which usually take seconds to complete, they started to run in minutes, hours in one case.
很抱歉我迟到了,但我想为未来的 Profiler 受害者分享我的答案。几天前,一台生产服务器(sql server 2012 sp4 Enterprise)中的所有标量函数都变慢了,一些通常需要几秒钟才能完成的存储过程,它们开始在几分钟内运行,在一个案例中运行几小时。
Finally, a trace created with the profiler was the root cause of this. The trace was started but then the laptop on which this trace was running was turned off with out previously stop the trace. Like a miracle, the trace was stoped automatically by the user sa (for the record, the sa account was disabled and renamed) --"SQL Trace stopped. Trace ID = '3'. Login Name = 'sa'." this automatically resolve the performance issue.
最后,使用分析器创建的跟踪是造成这种情况的根本原因。跟踪已启动,但随后运行此跟踪的笔记本电脑被关闭,而没有事先停止跟踪。奇迹般地,trace被用户sa自动停止了(为了记录,sa账户被禁用并重命名)--“SQL Trace停止。Trace ID = '3'。登录名 = 'sa'。” 这会自动解决性能问题。
So, Check for a profiler trace or extended events running on the slow server
因此,检查在慢速服务器上运行的探查器跟踪或扩展事件
Hope this help some one in the future.
希望这对将来的某个人有所帮助。