为什么人们如此讨厌 SQL 游标?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/287445/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why do people hate SQL cursors so much?
提问by Steven A. Lowe
I can understand wanting to avoid having to use a cursor due to the overhead and inconvenience, but it looks like there's some serious cursor-phobia-mania going on where people are going to great lengths to avoid having to use one.
我可以理解由于开销和不便而想要避免使用游标,但似乎存在一些严重的游标恐惧症,人们会竭尽全力避免使用游标。
For example, one question asked how to do something obviously trivial with a cursor and the accepted answer proposed using a common table expression (CTE) recursive query with a recursive custom function, even though this limits the number of rows that could be processed to 32 (due to recursive function call limit in sql server). This strikes me as a terrible solution for system longevity, not to mention a tremendous effort just to avoid using a simple cursor.
例如,有一个问题询问如何使用游标做一些明显微不足道的事情,并且使用带有递归自定义函数的通用表表达式 (CTE) 递归查询提出了已接受的答案,即使这将可以处理的行数限制为 32 (由于 sql server 中的递归函数调用限制)。这让我觉得这是一个糟糕的系统寿命解决方案,更不用说为了避免使用简单的游标而付出的巨大努力。
What is the reason for this level of insane hatred? Has some 'noted authority' issued a fatwa against cursors? Does some unspeakable evil lurk in the heart of cursors that corrupts the morals of children or something?
这种疯狂仇恨的原因是什么?是否有一些“著名的权威”发布了针对游标的教令?是不是游标心中潜藏着某种无法言说的邪恶,败坏孩子的道德之类的?
Wiki question, more interested in the answer than the rep.
维基问题,比代表对答案更感兴趣。
Related Info:
相关信息:
SQL Server Fast Forward Cursors
EDIT: let me be more precise: I understand that cursors should not be used instead of normal relational operations; that is a no-brainer. What I don't understand is people going waaaaay out of their way to avoid cursors like they have cooties or something, even when a cursor is a simpler and/or more efficient solution. It's the irrational hatred that baffles me, not the obvious technical efficiencies.
编辑:让我更准确地说:我知道不应使用游标代替正常的关系操作;这是一个明智的选择。我不明白的是,即使游标是一种更简单和/或更有效的解决方案,人们也会像有猫一样避免使用游标。让我困惑的是非理性的仇恨,而不是明显的技术效率。
采纳答案by S.Lott
The "overhead" with cursors is merely part of the API. Cursors are how parts of the RDBMS work under the hood. Often CREATE TABLE
and INSERT
have SELECT
statements, and the implementation is the obvious internal cursor implementation.
游标的“开销”只是 API 的一部分。游标是 RDBMS 的一部分在幕后工作的方式。经常CREATE TABLE
和INSERT
有SELECT
语句,并且实现是明显的内部游标实现。
Using higher-level "set-based operators" bundles the cursor results into a single result set, meaning less API back-and-forth.
使用更高级别的“基于集合的运算符”将游标结果捆绑到单个结果集中,这意味着更少的 API 来回。
Cursors predate modern languages that provide first-class collections. Old C, COBOL, Fortran, etc., had to process rows one at a time because there was no notion of "collection" that could be used widely. Java, C#, Python, etc., have first-class list structures to contain result sets.
游标早于提供一流集合的现代语言。旧的 C、COBOL、Fortran 等必须一次处理一行,因为没有可以广泛使用的“集合”概念。Java、C#、Python 等具有一流的列表结构来包含结果集。
The Slow Issue
缓慢的问题
In some circles, the relational joins are a mystery, and folks will write nested cursors rather than a simple join. I've seen truly epic nested loop operations written out as lots and lots of cursors. Defeating an RDBMS optimization. And running really slowly.
在某些圈子中,关系连接是一个谜,人们会编写嵌套游标而不是简单的连接。我见过写成大量游标的真正史诗般的嵌套循环操作。击败 RDBMS 优化。而且跑起来真的很慢。
Simple SQL rewrites to replace nested cursor loops with joins and a single, flat cursor loop can make programs run in 100th the time. [They thought I was the god of optimization. All I did was replace nested loops with joins. Still used cursors.]
简单的 SQL 重写以用连接替换嵌套的游标循环,单个平面游标循环可以使程序运行 100 次。[他们认为我是优化之神。我所做的只是用连接替换嵌套循环。仍然使用游标。]
This confusion often leads to an indictment of cursors. However, it isn't the cursor, it's the misuse of the cursor that's the problem.
这种混淆通常会导致对游标的起诉。然而,问题不是光标,而是光标的误用。
The Size Issue
尺寸问题
For really epic result sets (i.e., dumping a table to a file), cursors are essential. The set-based operations can't materialize really large result sets as a single collection in memory.
对于真正史诗般的结果集(即,将表转储到文件),游标是必不可少的。基于集合的操作无法将真正大的结果集具体化为内存中的单个集合。
Alternatives
备择方案
I try to use an ORM layer as much as possible. But that has two purposes. First, the cursors are managed by the ORM component. Second, the SQL is separated from the application into a configuration file. It's not that the cursors are bad. It's that coding all those opens, closes and fetches is not value-add programming.
我尽量使用 ORM 层。但这有两个目的。首先,游标由 ORM 组件管理。其次,将 SQL 从应用程序中分离到一个配置文件中。并不是游标不好。对所有这些打开、关闭和获取的代码进行编码并不是增值编程。
回答by Galwegian
Cursors make people overly apply a procedural mindset to a set-based environment.
游标使人们过度将程序思维方式应用于基于集合的环境。
And they are SLOW!!!
而且他们很慢!!!
From SQLTeam:
来自SQLTeam:
Please note that cursors are the SLOWEST way to access data inside SQL Server. The should only be used when you truly need to access one row at a time. The only reason I can think of for that is to call a stored procedure on each row. In the Cursor Performance articleI discovered that cursors are over thirty times slower than set based alternatives.
请注意,游标是访问 SQL Server 内部数据的最慢方法。应该只在您真正需要一次访问一行时使用。我能想到的唯一原因是在每一行上调用一个存储过程。在Cursor Performance 文章中,我发现游标比基于集合的替代方案慢 30 多倍。
回答by davidcl
There's an answer above which says "cursors are the SLOWEST way to access data inside SQL Server... cursors are over thirty times slower than set based alternatives."
上面有一个答案说“游标是访问 SQL Server 内部数据的最慢方式……游标比基于集合的替代方案慢三十多倍。”
This statement may be true under many circumstances, but as a blanket statement it's problematic. For example, I've made good use of cursors in situations where I want to perform an update or delete operation affecting many rows of a large table which is receiving constant production reads. Running a stored procedure which does these updates one row at a time ends up being faster than set-based operations, because the set-based operation conflicts with the read operation and ends up causing horrific locking problems (and may kill the production system entirely, in extreme cases).
在许多情况下,这种说法可能是正确的,但作为笼统的说法,这是有问题的。例如,在我想要执行更新或删除操作影响一个接收持续生产读取的大表的许多行的情况下,我已经很好地使用了游标。运行一个每次更新一行的存储过程最终比基于集合的操作更快,因为基于集合的操作与读取操作冲突并最终导致可怕的锁定问题(并且可能完全杀死生产系统,在极端的情况下)。
In the absence of other database activity, set-based operations are universally faster. In production systems, it depends.
在没有其他数据库活动的情况下,基于集合的操作普遍更快。在生产系统中,这取决于。
回答by davidcl
Cursors tend to be used by beginning SQL developers in places where set-based operations would be better. Particularly when people learn SQL after learning a traditional programming language, the "iterate over these records" mentality tends to lead people to use cursors inappropriately.
游标往往由 SQL 初学者在基于集合的操作会更好的地方使用。尤其是在学习了传统编程语言之后再学习 SQL 时,“迭代这些记录”的心态往往会导致人们不恰当地使用游标。
Most serious SQL books include a chapter enjoining the use of cursors; well-written ones make it clear that cursors have their place but shouldn't be used for set-based operations.
大多数严肃的 SQL 书籍都包含一章禁止使用游标;写得很好的文章清楚地表明游标有其位置,但不应用于基于集合的操作。
There are obviously situations where cursors are the correct choice, or at least A correct choice.
显然,在某些情况下,游标是正确的选择,或者至少是正确的选择。
回答by Cade Roux
The optimizer often cannot use the relational algebra to transform the problem when a cursor method is used. Often a cursor is a great way to solve a problem, but SQL is a declarative language, and there is a lot of information in the database, from constraints, to statistics and indexes which mean that the optimizer has a lot of options to solve the problem, whereas a cursor pretty much explicitly directs the solution.
当使用游标方法时,优化器通常无法使用关系代数来转换问题。通常游标是解决问题的好方法,但 SQL 是一种声明性语言,数据库中有很多信息,从约束到统计和索引,这意味着优化器有很多选项可以解决问题问题,而游标几乎明确地指导解决方案。
回答by tuinstoel
In Oracle PL/SQL cursors will not result in table locks and it is possible to use bulk-collecting/bulk-fetching.
在 Oracle PL/SQL 游标中不会导致表锁定,并且可以使用批量收集/批量获取。
In Oracle 10 the often used implicit cursor
在 Oracle 10 中经常使用的隐式游标
for x in (select ....) loop
--do something
end loop;
fetches implicitly 100 rows at a time. Explicit bulk-collecting/bulk-fetching is also possible.
一次隐式获取 100 行。显式批量收集/批量获取也是可能的。
However PL/SQL cursors are something of a last resort, use them when you are unable to solve a problem with set-based SQL.
然而,PL/SQL 游标是最后的手段,当您无法解决基于集合的 SQL 的问题时使用它们。
Another reason is parallelization, it is easier for the database to parallelize big set-based statements than row-by-row imperative code. It is the same reason why functional programming becomes more and more popular (Haskell, F#, Lisp, C# LINQ, MapReduce ...), functional programming makes parallelization easier. The number CPUs per computer is rising so parallelization becomes more and more an issue.
另一个原因是并行化,数据库并行化基于集合的大语句比逐行命令式代码更容易。与函数式编程变得越来越流行(Haskell、F#、Lisp、C# LINQ、MapReduce ...)的原因相同,函数式编程使并行化更容易。每台计算机的 CPU 数量不断增加,因此并行化变得越来越成为一个问题。
回答by Richard T
The answers above have not emphasized enough the importance of locking. I'm not a big fan of cursors because they often result in table level locks.
上面的答案没有足够强调锁定的重要性。我不是游标的忠实粉丝,因为它们通常会导致表级锁。
回答by Charles Bretana
In general, because on a relational database, the performance of code using cursors is an order of magnitude worse than set-based operations.
一般来说,因为在关系数据库上,使用游标的代码的性能比基于集合的操作差一个数量级。
回答by Eric Sabine
For what it's worth I have read that the "one" place a cursor will out perform its set-based counterpart is in a running total. Over a small table the speed of summing up the rows over the order by columns favors the set-based operation but as the table increases in row size the cursor will become faster because it can simply carry the running total value to the next pass of the loop. Now whereyou should do a running total is a different argument...
值得一提的是,我已经读到游标将执行其基于集合的对应项的“一个”位置是一个运行总数。在一个小表上,按列顺序对行求和的速度有利于基于集合的操作,但随着表的行大小增加,游标将变得更快,因为它可以简单地将运行总值带到环形。现在你应该在哪里做一个总和是一个不同的论点......
回答by Edin Omeragic
I agree with article on this page:
我同意此页面上的文章:
http://weblogs.sqlteam.com/jeffs/archive/2008/06/05/sql-server-cursor-removal.aspx
http://weblogs.sqlteam.com/jeffs/archive/2008/06/05/sql-server-cursor-removal.aspx