SQL 一般来说,存储过程是否比现代 RDBMS 上的内联语句更有效?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/59880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS's?
提问by casademora
Conventional wisdom states that stored procedures are always faster. So, since they're always faster, use them ALL THE TIME.
传统观点认为存储过程总是更快。所以,因为他们总是更快,使用它们所有的时间。
I am pretty sure this is grounded in some historical context where this was once the case. Now, I'm not advocating that Stored Procs are not needed, but I want to know in what cases stored procedures are necessary in modern databases such as MySQL, SQL Server, Oracle, or <Insert_your_DB_here>. Is it overkill to have ALL access through stored procedures?
我很确定这是基于某些历史背景,曾经是这种情况。现在,我不主张不需要存储过程,但我想知道在什么情况下存储过程在现代数据库(如 MySQL、SQL Server、Oracle 或 < Insert_your_DB_here>)中是必需的。通过存储过程进行所有访问是否过大?
回答by Matt Rogish
NOTEthat this is a general look at stored procedures not regulated to a specific DBMS. Some DBMS (and even, different versions of the same DBMS!) may operate contrary to this, so you'll want to double-check with your target DBMS before assuming all of this still holds.
I've been a Sybase ASE, MySQL, and SQL Server DBA on-and off since for almost a decade (along with application development in C, PHP, PL/SQL, C#.NET, and Ruby). So, I have no particular axe to grind in this (sometimes) holy war.
请注意,这是对未受特定 DBMS 监管的存储过程的一般看法。某些 DBMS(甚至同一 DBMS 的不同版本!)可能与此相反,因此在假设所有这些仍然成立之前,您需要仔细检查目标 DBMS。
近十年来,我一直是 Sybase ASE、MySQL 和 SQL Server DBA(以及使用 C、PHP、PL/SQL、C#.NET 和 Ruby 进行应用程序开发)。所以,在这场(有时)圣战中,我没有特别的斧头可磨。
The historical performance benefit of stored procs have generally been from the following (in no particular order):
存储过程的历史性能优势通常来自以下方面(无特定顺序):
- Pre-parsed SQL
- Pre-generated query execution plan
- Reduced network latency
- Potential cache benefits
- 预解析的 SQL
- 预先生成的查询执行计划
- 减少网络延迟
- 潜在的缓存优势
Pre-parsed SQL-- similar benefits to compiled vs. interpreted code, except on a very micro level.
预解析的 SQL—— 与编译代码和解释代码的好处相似,只是在非常微观的层面上。
Still an advantage?Not very noticeable at all on the modern CPU, but if you are sending a single SQL statement that is VERY large eleventy-billion times a second, the parsing overhead can add up.
还是优势?在现代 CPU 上根本不是很明显,但是如果您正在发送一个每秒 1100 亿次非常大的 SQL 语句,则解析开销可能会增加。
Pre-generated query execution plan. If you have many JOINs the permutations can grow quite unmanageable (modern optimizers have limits and cut-offs for performance reasons). It is not unknown for very complicated SQL to have distinct, measurable (I've seen a complicated query take 10+ seconds just to generate a plan, before we tweaked the DBMS) latencies due to the optimizer trying to figure out the "near best" execution plan. Stored procedures will, generally, store this in memory so you can avoid this overhead.
预先生成的查询执行计划。如果你有很多 JOIN,排列会变得非常难以管理(现代优化器出于性能原因有限制和截止)。非常复杂的 SQL 具有独特的、可测量的(在我们调整 DBMS 之前,我看到一个复杂的查询需要 10 多秒才能生成计划)的延迟并不是未知的,因为优化器试图找出“接近最佳” ” 执行计划。存储过程通常会将其存储在内存中,因此您可以避免这种开销。
Still an advantage?Most DBMS' (the latest editions) will cache the query plans for INDIVIDUAL SQL statements, greatly reducing the performance differential between stored procs and ad hoc SQL. There are some caveats and cases in which this isn't the case, so you'll need to test on your target DBMS.
还是优势?大多数 DBMS(最新版本)会缓存 INDIVIDUAL SQL 语句的查询计划,从而大大减少存储过程和即席 SQL 之间的性能差异。有一些警告和情况并非如此,因此您需要在目标 DBMS 上进行测试。
Also, more and more DBMS allow you to provide optimizer path plans (abstract query plans) to significantly reduce optimization time (for both ad hoc and stored procedure SQL!!).
此外,越来越多的 DBMS 允许您提供优化器路径计划(抽象查询计划)以显着减少优化时间(对于即席和存储过程 SQL!!)。
WARNINGCached query plans are not a performance panacea. Occasionally the query plan that is generated is sub-optimal. For example, if you send
SELECT * FROM table WHERE id BETWEEN 1 AND 99999999
, the DBMS may select a full-table scan instead of an index scan because you're grabbing every row in the table (so sayeth the statistics). If this is the cached version, then you can get poor performance when you later sendSELECT * FROM table WHERE id BETWEEN 1 AND 2
. The reasoning behind this is outside the scope of this posting, but for further reading see: http://www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspxand http://msdn.microsoft.com/en-us/library/ms181055.aspxand http://www.simple-talk.com/sql/performance/execution-plan-basics/"In summary, they determined that supplying anything other than the common values when a compile or recompile was performed resulted in the optimizer compiling and caching the query plan for that particular value. Yet, when that query plan was reused for subsequent executions of the same query for the common values (‘M', ‘R', or ‘T'), it resulted in sub-optimal performance. This sub-optimal performance problem existed until the query was recompiled. At that point, based on the @P1 parameter value supplied, the query might or might not have a performance problem."
警告缓存查询计划不是性能灵丹妙药。有时生成的查询计划是次优的。例如,如果您发送
SELECT * FROM table WHERE id BETWEEN 1 AND 99999999
,DBMS 可能会选择全表扫描而不是索引扫描,因为您正在抓取表中的每一行(统计数据如此)。如果这是缓存版本,那么稍后发送SELECT * FROM table WHERE id BETWEEN 1 AND 2
. 这背后的原因超出了本文的范围,但要进一步阅读,请参阅:http: //www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspx和 http://msdn.microsoft.com/ en-us/library/ms181055.aspx和http://www.simple-talk.com/sql/performance/execution-plan-basics/“总而言之,他们确定在执行编译或重新编译时提供除公共值之外的任何内容会导致优化器编译并缓存该特定值的查询计划。然而,当该查询计划被重用于后续执行相同查询公共值('M'、'R' 或 'T'),它导致了次优性能。这个次优性能问题一直存在,直到查询被重新编译。那时,基于@P1提供的参数值,查询可能有也可能没有性能问题。”
Reduced network latencyA) If you are running the same SQL over and over -- and the SQL adds up to many KB of code -- replacing that with a simple "exec foobar" can really add up. B) Stored procs can be used to move procedural code into the DBMS. This saves shuffling large amounts of data off to the client only to have it send a trickle of info back (or none at all!). Analogous to doing a JOIN in the DBMS vs. in your code (everyone's favorite WTF!)
减少网络延迟A) 如果您一遍又一遍地运行相同的 SQL——并且 SQL 加起来有许多 KB 的代码——用简单的“exec foobar”替换它真的可以加起来。B) 存储过程可用于将过程代码移动到 DBMS。这可以避免将大量数据混洗到客户端,只是为了让它返回一些信息(或根本不发送!)。类似于在 DBMS 中与在您的代码中执行 JOIN(每个人都喜欢的 WTF!)
Still an advantage?A) Modern 1Gb (and 10Gb and up!) Ethernet really make this negligible. B) Depends on how saturated your network is -- why shove several megabytes of data back and forth for no good reason?
还是优势?A) 现代 1Gb(以及 10Gb 及更高版本!)以太网确实使这一点可以忽略不计。B) 取决于您的网络有多饱和——为什么无缘无故地来回推送几兆字节的数据?
Potential cache benefitsPerforming server-side transforms of data can potentially be faster if you have sufficient memory on the DBMS and the data you need is in memory of the server.
潜在的缓存优势如果您在 DBMS 上有足够的内存并且您需要的数据在服务器的内存中,则执行服务器端数据转换可能会更快。
Still an advantage?Unless your app has shared memory access to DBMS data, the edge will always be to stored procs.
还是优势?除非您的应用程序具有对 DBMS 数据的共享内存访问权限,否则优势将始终是存储过程。
Of course, no discussion of Stored Procedure optimization would be complete without a discussion of parameterized and ad hoc SQL.
当然,如果不讨论参数化 SQL 和即席 SQL,就不会完整地讨论存储过程优化。
Parameterized / Prepared SQL
Kind of a cross between stored procedures and ad hoc SQL, they are embedded SQL statements in a host language that uses "parameters" for query values, e.g.:
参数化/准备 SQL
存储过程和即席 SQL 之间的一种交叉,它们是使用“参数”作为查询值的宿主语言中的嵌入式 SQL 语句,例如:
SELECT .. FROM yourtable WHERE foo = ? AND bar = ?
These provide a more generalized version of a query that modern-day optimizers can use to cache (and re-use) the query execution plan, resulting in much of the performance benefit of stored procedures.
这些提供了一个更通用的查询版本,现代优化器可以使用它来缓存(和重用)查询执行计划,从而获得存储过程的大部分性能优势。
Ad Hoc SQLJust open a console window to your DBMS and type in a SQL statement. In the past, these were the "worst" performers (on average) since the DBMS had no way of pre-optimizing the queries as in the parameterized/stored proc method.
Ad Hoc SQL只需打开 DBMS 的控制台窗口并输入 SQL 语句即可。在过去,这些是“最差”的执行者(平均而言),因为 DBMS 无法像参数化/存储过程方法那样预先优化查询。
Still a disadvantage?Not necessarily. Most DBMS have the ability to "abstract" ad hoc SQL into parameterized versions -- thus more or less negating the difference between the two. Some do this implicitly or must be enabled with a command setting (SQL server: http://msdn.microsoft.com/en-us/library/ms175037.aspx, Oracle: http://www.praetoriate.com/oracle_tips_cursor_sharing.htm).
还是劣势?不必要。大多数 DBMS 能够将即席 SQL“抽象”为参数化版本——从而或多或少地消除了两者之间的差异。有些隐式执行此操作或必须使用命令设置启用(SQL 服务器:http: //msdn.microsoft.com/en-us/library/ms175037.aspx,Oracle:http: //www.praetoriate.com/oracle_tips_cursor_sharing。嗯)。
Lessons learned?Moore's law continues to march on and DBMS optimizers, with every release, get more sophisticated. Sure, you can place every single silly teeny SQL statement inside a stored proc, but just know that the programmers working on optimizers are very smart and are continually looking for ways to improve performance. Eventually (if it's not here already) ad hoc SQL performance will become indistinguishable (on average!) from stored procedure performance, so any sort of massivestored procedure use ** solely for "performance reasons"** sure sounds like premature optimization to me.
得到教训?摩尔定律继续向前发展,DBMS 优化器随着每个版本的发布变得更加复杂。当然,您可以将每一个愚蠢的小 SQL 语句放在存储过程中,但要知道从事优化器工作的程序员非常聪明,并且不断寻找提高性能的方法。最终(如果它还没有),即席 SQL 性能将变得与存储过程性能无法区分(平均而言!),因此任何类型的大规模存储过程都仅出于“性能原因”** 使用 ** 对我来说肯定听起来像过早的优化.
Anyway, I think if you avoid the edge cases and have fairly vanilla SQL, you won't notice a difference between ad hoc and stored procedures.
无论如何,我认为如果您避免边缘情况并拥有相当普通的 SQL,您将不会注意到临时和存储过程之间的区别。
回答by DOK
Reasons for using stored procedures:
使用存储过程的原因:
- Reduce network traffic-- you have to send the SQL statement across the network. With sprocs, you can execute SQL in batches, which is also more efficient.
- Caching query plan-- the first time the sproc is executed, SQL Server creates an execution plan, which is cached for reuse. This is particularly performant for small queries run frequently.
- Ability to use output parameters-- if you send inline SQL that returns one row, you can only get back a recordset. With sprocs you can get them back as output parameters, which is considerably faster.
- Permissions-- when you send inline SQL, you have to grant permissions on the table(s) to the user, which is granting much more access than merely granting permission to execute a sproc
- Separation of logic-- remove the SQL-generating code and segregate it in the database.
- Ability to edit without recompiling-- this can be controversial. You can edit the SQL in a sproc without having to recompile the application.
- Find where a table is used-- with sprocs, if you want to find all SQL statements referencing a particular table, you can export the sproc code and search it. This is much easier than trying to find it in code.
- Optimization-- It's easier for a DBA to optimize the SQL and tune the database when sprocs are used. It's easier to find missing indexes and such.
- SQL injection attacks-- properly written inline SQL can defend against attacks, but sprocs are better for this protection.
- 减少网络流量——您必须通过网络发送 SQL 语句。使用sprocs,可以批量执行SQL,效率也更高。
- 缓存查询计划——第一次执行 sproc 时,SQL Server 会创建一个执行计划,该计划被缓存以供重用。这对于频繁运行的小型查询特别有效。
- 能够使用输出参数——如果您发送返回一行的内联 SQL,您只能返回一个记录集。使用 sprocs,您可以将它们作为输出参数取回,这要快得多。
- 权限——当您发送内联 SQL 时,您必须向用户授予对表的权限,这比仅仅授予执行 sproc 的权限要多得多
- 逻辑分离——删除生成 SQL 的代码并将其隔离在数据库中。
- 无需重新编译即可编辑- 这可能会引起争议。您可以在 sproc 中编辑 SQL,而无需重新编译应用程序。
- 查找表的使用位置——使用 sproc,如果要查找引用特定表的所有 SQL 语句,可以导出 sproc 代码并进行搜索。这比尝试在代码中查找要容易得多。
- 优化——当使用 sprocs 时,DBA 更容易优化 SQL 和调整数据库。更容易找到丢失的索引等。
- SQL 注入攻击——正确编写的内联 SQL 可以防御攻击,但 sproc 更适合这种保护。
回答by Jon Galloway
In many cases, stored procedures are actually slower because they're more genaralized. While stored procedures can be highly tuned, in my experience there's enough development and institutional friction that they're left in place once they work, so stored procedures often tend to return a lot of columns "just in case" - because you don't want to deploy a new stored procedure every time you change your application. An OR/M, on the other hand, only requests the columns the application is using, which cuts down on network traffic, unnecessary joins, etc.
在许多情况下,存储过程实际上更慢,因为它们更通用。虽然存储过程可以高度调整,但根据我的经验,有足够的开发和制度摩擦,一旦它们工作就留在原地,因此存储过程通常倾向于“以防万一”返回很多列 - 因为你没有每次更改应用程序时都想部署一个新的存储过程。另一方面,OR/M 只请求应用程序正在使用的列,这减少了网络流量、不必要的连接等。
回答by Steve Morgan
It's a debate that rages on and on (for instance, here).
这是一场持续不断的辩论(例如,这里)。
It's as easy to write bad stored procedures as it is to write bad data access logic in your app.
编写错误的存储过程与在应用程序中编写错误的数据访问逻辑一样容易。
My preference is for Stored Procs, but that's because I'm typically working with very large and complex apps in an enterprise environment where there are dedicated DBAs who are responsible for keeping the database servers running sweetly.
我更喜欢存储过程,但那是因为我通常在企业环境中使用非常大型和复杂的应用程序,在那里有专门的 DBA 负责保持数据库服务器正常运行。
In other situations, I'm happy enough for data access technologies such as LINQ to take care of the optimisation.
在其他情况下,我很高兴使用 LINQ 等数据访问技术来处理优化。
Pure performance isn't the only consideration, though. Aspects such as security and configuration management are typically at least as important.
不过,纯粹的性能并不是唯一的考虑因素。安全和配置管理等方面通常至少同样重要。
Edit: While Frans Bouma's article is indeed verbose, it misses the point with regard to security by a mile. The fact that it's 5 years old doesn't help its relevance, either.
编辑:虽然 Frans Bouma 的文章确实很冗长,但它在安全方面忽略了一点。它已有 5 年历史的事实也无助于其相关性。
回答by Joel Coehoorn
There is no noticeable speed difference for stored procedures vs parameterized or prepared queries on most modern databases, because the database will also cache execution plans for those queries.
在大多数现代数据库上,存储过程与参数化或准备查询的速度没有明显差异,因为数据库还将缓存这些查询的执行计划。
Note that a parameterized query is not the same as ad hoc sql.
请注意,参数化查询与即席 sql 不同。
The main reason imo to still favor stored procedures today has more to do with security. If you use stored procedures exclusively, you can disable INSERT, SELECT, UPDATE, DELETE, ALTER, DROP, and CREATE etc permissions for your application's user, only leaving it with EXECUTE.
今天imo仍然偏爱存储过程的主要原因更多地与安全性有关。如果你使用存储过程完全,您可以禁用INSERT,SELECT,UPDATE,DELETE,修改,删除,并为您的应用程序的用户权限等,只有EXECUTE离开它。
This provides a little extra protection against 2nd ordersql injection. Parameterized queries only protect against 1st orderinjection.
这为二阶sql 注入提供了一点额外的保护。参数化查询仅防止一阶注入。
回答by fuzzbone
The one topic that no one has yet mentioned as a benefit of stored procedures is security. If you build the application exclusively with data access via stored procedures, you can lockdown the database so the ONLY access is via those stored procedures. Therefor, even if someone gets a database ID and password, they will be limited in what they can see or do against that database.
安全性是存储过程的一个好处,但没有人提及的一个主题。如果您以通过存储过程访问数据的方式专门构建应用程序,则可以锁定数据库,因此只能通过这些存储过程进行访问。因此,即使有人获得了数据库 ID 和密码,他们在该数据库上可以看到或执行的操作也会受到限制。
回答by Jon Ericson
Obviously, actual performance ought to be measured in individual cases, not assumed. But even in cases where performance is hamperedby a stored procedure, there are good reasons to use them:
显然,实际性能应该在个别情况下衡量,而不是假设。但即使在性能受到存储过程影响的情况下,也有充分的理由使用它们:
Application developers aren't always the best SQL coders. Stored procedures hides SQL from the application.
Stored procedures automatically use bind variables. Application developers often avoid bind variables because they seem like unneeded code and show little benefit in small test systems. Later on, the failure to use bind variables can throttle RDBMS performance.
Stored procedures create a layer of indirection that might be useful later on. It's possible to change implementation details (including table structure) on the database side without touching application code.
The exercise of creating stored procedures can be useful for documenting all database interactions for a system. And it's easier to update the documentation when things change.
应用程序开发人员并不总是最好的 SQL 编码人员。存储过程对应用程序隐藏 SQL。
存储过程自动使用绑定变量。应用程序开发人员经常避免绑定变量,因为它们看起来像是不需要的代码,并且在小型测试系统中几乎没有什么好处。稍后,无法使用绑定变量会限制 RDBMS 的性能。
存储过程创建了一个间接层,稍后可能会有用。可以在不接触应用程序代码的情况下更改数据库端的实现细节(包括表结构)。
创建存储过程的练习对于记录系统的所有数据库交互非常有用。当情况发生变化时,更新文档会更容易。
That said, I usually stick raw SQL in my applications so that I can control it myself. It depends on your development team and philosophy.
也就是说,我通常在我的应用程序中保留原始 SQL,以便我可以自己控制它。这取决于您的开发团队和理念。
回答by Dennis Decker Jensen
In 2007 I was on a project, where we used MS SQL Server via an ORM. We had 2 big, growing tables which took up to 7-8 seconds of load time on the SQL Server. After making 2 large, stored SQL procedures, and optimizing them from the query planner, each DB load time got down to less than 20 milliseconds, so clearly there are still efficiency reasons to use stored SQL procedures.
2007 年,我参与了一个项目,我们通过 ORM 使用了 MS SQL Server。我们有 2 个不断增长的大表,它们在 SQL Server 上的加载时间长达 7-8 秒。在制作了 2 个大型 SQL 存储过程,并从查询计划器优化它们之后,每个 DB 加载时间下降到不到 20 毫秒,因此显然使用存储 SQL 过程仍然存在效率原因。
Having said that, we found out that the most important benefit of stored procedures was the added maintaince-ease, security, data-integrity, and decoupling business-logic from the middleware-logic, benefitting all middleware-logic from reuse of the 2 procedures.
话虽如此,我们发现存储过程最重要的好处是增加了维护方便性、安全性、数据完整性以及将业务逻辑与中间件逻辑解耦,使所有中间件逻辑受益于重用这两个过程.
Our ORM vendor made the usual claim that firing off many small SQL queries were going to be more efficient than fetching large, joined data sets. Our experience (to our surprise) showed something else.
我们的 ORM 供应商通常声称,启动许多小型 SQL 查询将比获取大型连接数据集更有效。我们的经历(出乎我们的意料)显示了其他一些东西。
This may of course vary between machines, networks, operating systems, SQL servers, application frameworks, ORM frameworks, and language implementations, so measure any benefit, you THINK you may get from doing something else.
这当然可能因机器、网络、操作系统、SQL 服务器、应用程序框架、ORM 框架和语言实现而异,因此衡量任何好处,您认为您可能会从做其他事情中获得。
It wasn't until we benchmarked that we discovered the problem was between the ORM and the database taking all the load.
直到我们进行了基准测试,我们才发现问题出在 ORM 和承担所有负载的数据库之间。
回答by Flory
I prefer to use SP's when it makes sense to use them. In SQL Server anyway there is no performance advantage to SP's over a parametrized query.
当使用它们有意义时,我更喜欢使用 SP。无论如何,在 SQL Server 中,SP 与参数化查询相比没有性能优势。
However, at my current job my boss mentioned that we are forced to use SP's because our customer's require them. They feel that they are more secure. I have not been here long enough to see if we are implementing role based security but I have a feeling we do.
但是,在我目前的工作中,我的老板提到我们被迫使用 SP,因为我们的客户需要它们。他们觉得自己更有安全感。我来这里的时间还不够长,无法了解我们是否正在实施基于角色的安全性,但我有一种感觉。
So the customer's feelings trump all other arguments in this case.
因此,在这种情况下,客户的感受胜过所有其他论点。
回答by jmt
To me one advantage of stored procedures is to be host language agnostic: you can switch from a C, Python, PHP or whatever application to another programming language without rewriting your code. In addition, some features like bulk operations improve really performance and are not easily available (not at all?) in host languages.
对我来说,存储过程的一个优点是与宿主语言无关:您可以从 C、Python、PHP 或任何应用程序切换到另一种编程语言,而无需重写代码。此外,诸如批量操作之类的一些功能确实可以提高性能,并且在宿主语言中不容易获得(根本没有?)。