理解复杂 SQL 语句的最佳方式?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/379062/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to understand complex SQL statements?
提问by Eric Ness
Does anyone have a method to understand complex SQL statements? When reading structural / OO code there are usually layers of abstraction that help you break it down into manageable chunks. Often in SQL, though, it seems that you have to keep track of what's going on in multiple parts of a query all at the same time.
有没有人有办法理解复杂的 SQL 语句?在阅读结构化/面向对象代码时,通常有抽象层可以帮助您将其分解为可管理的块。但是,通常在 SQL 中,您似乎必须同时跟踪查询的多个部分中发生的情况。
The impetus for this question is the SQL query discussed in this question about a complex join. After staring at the answer queries for a number of minutes I finally decided to step through the query using particular records to see what was going on. That was the only way I could think of to understand the query piece by piece.
这个问题的推动力是这个问题中讨论的关于复杂连接的 SQL 查询。在盯着答案查询几分钟后,我最终决定使用特定记录单步执行查询以查看发生了什么。这是我能想到的逐条理解查询的唯一方法。
Is there a better way to break a SQL query down into manageable pieces?
有没有更好的方法将 SQL 查询分解为可管理的部分?
回答by HLGEM
When I look at a complex bit of SQL Code, this is what I do.
当我查看一些复杂的 SQL 代码时,这就是我所做的。
First, if it is an update or delete, I add code (if it isn't there and commented out) to make it a select. Never try an update or delete for the first time without seeing the results in a select first. If it is an update, I make sure the select shows the current value and what I will be setting it to in order to make sure that I'm getting the desired result.
首先,如果它是更新或删除,我添加代码(如果它不存在并注释掉)以使其成为选择。第一次尝试更新或删除时,请先不要在选择中看到结果。如果是更新,我确保选择显示当前值以及我将设置的值,以确保获得所需的结果。
Understanding the joins is critical to understanding complex SQL. For every join I ask myself why is this here? There are four basic reasons. You need a column for the select, you need a field for the where clause, you need the join as a bridge to a third table, or you need to join to the table to filter records (such as retrieving details on customer who have orders but not needing the order details, this can often be done better with an IF EXISTS where clause). If it is a left or right join (I tend to rewrite so everything is a left join which makes life simpler.), I consider whether an inner join would work. Why do I need a left join? If I don't know the answer, I will run it both ways and see what the difference is within the data. If there are derived tables, I will look at those first (running just that part of the select to see what the result is) to understand why it is there. If there are sub-queries, I will try to understand them and if they are slow will try to convert to a derived table instead as those are often much faster.
了解联接对于了解复杂的 SQL 至关重要。对于每一次加入,我都会问自己为什么会在这里?有四个基本原因。您需要一个用于选择的列,您需要一个用于 where 子句的字段,您需要连接作为到第三个表的桥梁,或者您需要连接到该表以过滤记录(例如检索有关有订单的客户的详细信息但不需要订单详细信息,这通常可以使用 IF EXISTS where 子句更好地完成)。如果是左连接或右连接(我倾向于重写,所以一切都是左连接,这使生活更简单。),我会考虑内部连接是否可行。为什么需要左连接?如果我不知道答案,我会以两种方式运行它并查看数据中的差异。如果有派生表,我将首先查看那些(仅运行 select 的那部分以查看结果)以了解它为什么存在。如果有子查询,我会尝试理解它们,如果它们很慢,我会尝试转换为派生表,因为它们通常要快得多。
Next, I look at the where
clauses. This is one place where a solid foundation in your particular database will come in handy. For instance, I know in my databases what occasions I might need to see only the mailing address and what occasions I might need to see other addresses. This helps me to know if something is missing from the where clause. Otherwise I consider each item in the where
clause and figure out why it would need to be there, then I consider whether there is anything missing that should be there. After looking it over, I consider if I can make adjustments to make the query sargable.
接下来,我看看where
条款。在这里,您的特定数据库中的坚实基础将派上用场。例如,我知道在我的数据库中什么情况下我可能只需要查看邮寄地址,什么情况下我可能需要查看其他地址。这有助于我知道 where 子句中是否缺少某些内容。否则我会考虑条款中的每个项目where
并弄清楚为什么需要它,然后我会考虑是否缺少任何应该存在的东西。在查看之后,我考虑是否可以进行调整以使查询可以进行查询。
I also consider any complex bits of the select list next. What does that case statement do? Why is there a subquery? What do those functions do? (I always look up the function code for any function I'm not already familiar with.) Why is there a distinct? Can it be gotten rid of by using a derived table or aggregate function and group by statements?
接下来我还会考虑选择列表的任何复杂位。这个case语句有什么作用?为什么会有子查询?这些函数有什么作用?(我总是查找我不熟悉的任何函数的函数代码。)为什么有一个不同的?可以通过使用派生表或聚合函数和 group by 语句来摆脱它吗?
Finally and MOST important, I run the select and determine if the results look correct based on my knowledge of the business. If you don't understand your business, you won't know if the query is correct. Syntactically correct doesn't mean the right results. Often there is a part of your existing user interface that you can use as a guide to whether your results are correct. If I have a screen that shows the orders for a customer and I'm doing a report that includes the customer orders, I might spot check a few individual customers to make sure it is showing the right result.
最后也是最重要的是,我运行选择并根据我对业务的了解确定结果是否正确。如果您不了解您的业务,您就不会知道查询是否正确。语法正确并不意味着正确的结果。通常,您可以将现有用户界面的一部分用作您的结果是否正确的指南。如果我有一个显示客户订单的屏幕,并且我正在做一个包含客户订单的报告,我可能会抽查一些个别客户以确保它显示正确的结果。
If the current query is filtering incorrectly, I will remove bits of it to find out what is getting rid of the records I don't want or adding ones I don't want. Often you will find that the join is one to many and you need one to one (use a derived table in this case!) or you will find that some piece of information that you think you need in the where
clause isn't true for all the data you need or that some piece of the where
clause is missing. It helps to have all the fields in the where
clause (if they weren't in the select already) in the select at the time you do this. It may even help to show all the fields from all the joined tables and really look at the data. When I do this, I often add a small bit to the where clause to grab just some of the records that I have that shouldn't be there rather than all the records.
如果当前查询过滤不正确,我将删除它的一部分以找出是什么摆脱了我不想要的记录或添加了我不想要的记录。通常,您会发现连接是一对多的,而您需要一对一(在这种情况下使用派生表!),或者您会发现您认为在where
子句中需要的某些信息并不适用于所有情况您需要的数据或where
缺少条款的某些部分。它有助于将所有字段都包含在where
执行此操作时选择中的子句(如果它们不在选择中)。它甚至可能有助于显示所有连接表中的所有字段并真正查看数据。当我这样做时,我经常在 where 子句中添加一点,以获取我拥有的一些不应该存在的记录,而不是所有记录。
One sneaky thing that will break a lot of queries is the where
clause referencing a field in a table on the right side of a left join. That turns it into an inner join. If you really need a left join, you should add those kinds of conditions to the join itself.
一个会破坏很多查询的偷偷摸摸的事情是where
引用左连接右侧表中字段的子句。这把它变成了一个内部连接。如果你真的需要一个左连接,你应该将这些条件添加到连接本身。
回答by Kon
These may be some helpful hints..
这些可能是一些有用的提示..
- Comments- figure out what a small chunk does and comment it so you understand it when you refer back to it later.
- Syntax highlighting- make sure you're viewing code with something that will color-code the query.
- Indentation- reorganize the query to make sense for you.. tab things over, add carriage returns.
- 评论- 弄清楚一小块的作用并评论它,以便您在以后参考时理解它。
- 语法高亮- 确保您查看的代码中包含对查询进行颜色编码的内容。
- 缩进- 重新组织查询以对您有意义.. 选项卡上的内容,添加回车符。
For example:
例如:
select ID, Description, Status from ABC where Status = 1 OR Status = 3
could be better written as:
可以更好地写成:
select
ID,
Description,
Status
from ABC
where
Status = 1 OR
Status = 3
with a more complex query, you'd see a much bigger benefit.
使用更复杂的查询,您会看到更大的好处。
回答by John MacIntyre
Here's a procedure to follow to unravel a query.
下面是解开查询要遵循的过程。
- First I format the SQL.
- Then I comment out all parts of the SQL other than the basic parts of the most primary or most important table to answer the question.
- Then I will start uncommenting the joins, select columns, groupings, order fields, & filters to issolate different parts of the query to see what is happening. Or highlighted-execution works in some tools.
- Subqueries can usually be executed independently.
- 首先我格式化SQL。
- 然后我注释掉 SQL 的所有部分,而不是最主要或最重要的表的基本部分来回答这个问题。
- 然后我将开始取消对连接的注释,选择列、分组、排序字段和过滤器以隔离查询的不同部分以查看发生了什么。或者在某些工具中突出显示执行工作。
- 子查询通常可以独立执行。
Executing each of these usually allows me to get a better grip on what is happening in the query.
执行这些通常可以让我更好地掌握查询中发生的事情。
回答by Stephane Grenier
Mostly it's just experience and proper indenting.
大多数情况下,这只是经验和适当的缩进。
回答by EvilTeach
Indentation and comments help a lot. The most valuable thing I have run into is the WITH statement. It is in Oracle, and deals with subquery refactoring. It allows you to break a large query, into a set of seemingly smaller ones. Each just a bit more manageable.
缩进和注释有很大帮助。我遇到的最有价值的事情是 WITH 语句。它在 Oracle 中,处理子查询重构。它允许您将大型查询分解为一组看似较小的查询。每个都更易于管理。
Here is an example
这是一个例子
WITH
ssnInfo AS
(
SELECT SSN,
UPPER(LAST_NAME),
UPPER(FIRST_NAME),
TAXABLE_INCOME,
CHARITABLE_DONATIONS
FROM IRS_MASTER_FILE
WHERE STATE = 'MN' AND -- limit to in-state
TAXABLE_INCOME > 250000 AND -- is rich
CHARITABLE_DONATIONS > 5000 -- might donate too
),
doltishApplicants AS
(
SELECT SSN,
SAT_SCORE,
SUBMISSION_DATE
FROM COLLEGE_ADMISSIONS
WHERE SAT_SCORE < 100 -- About as smart as a Moose.
),
todaysAdmissions AS
(
SELECT doltishApplicants.SSN,
TRUNC(SUBMISSION_DATE) SUBMIT_DATE,
LAST_NAME, FIRST_NAME,
TAXABLE_INCOME
FROM ssnInfo,
doltishApplicants
WHERE ssnInfo.SSN = doltishApplicants.SSN
)
SELECT 'Dear ' || FIRST_NAME ||
' your admission to WhatsaMattaU has been accepted.'
FROM todaysAdmissions
WHERE SUBMIT_DATE = TRUNC(SYSDATE) -- For stuff received today only
;
The same thing can be done with inline views, but the with also has the ability to create temporary tables when needed. In some of the cases, you can copy out the subquery and execute it, outside the context of the large query.
内联视图也可以做同样的事情,但是 with 还可以在需要时创建临时表。在某些情况下,您可以复制子查询并在大型查询的上下文之外执行它。
This form also allows you to put the filter clauses with the individual subquery, and save the joining clauses for the final select.
这种形式还允许您将过滤子句与单个子查询放在一起,并保存最终选择的连接子句。
At work, our development group generally finds them easier to maintain, and frequently faster.
在工作中,我们的开发团队通常发现它们更易于维护,而且速度通常更快。
回答by Steven A. Lowe
formatting helps, but understanding set theory and by extension, relational theory, helps even more.
格式化有帮助,但理解集合论和扩展的关系理论更有帮助。
a vague understanding of how queries are executed won't hurt either (table scans, index scans, index jumps, hash-table merges, etc.); the query planner can show you these operations
对查询如何执行的模糊理解也不会受到伤害(表扫描、索引扫描、索引跳转、哈希表合并等);查询规划器可以向您展示这些操作
a few of the operations (having, exists, with) can be troublesome at first
一些操作(拥有、存在、使用)一开始可能会很麻烦
understand first what happens to each table, and how the tables are joined
首先了解每个表会发生什么,以及表是如何连接的
回答by Eran Galperin
I guess it all depends on experience. I didn't find the queries in that questions to be very complicated, perhaps since most of the queries I run are more complex than those.
我想这一切都取决于经验。我没有发现这些问题中的查询非常复杂,也许是因为我运行的大多数查询都比那些更复杂。
Proper coding standards certainly helps understanding queries, as it allows to break it into visually smaller and better formatted chunks. When subqueries are involved, it is better to understand what those do first and use that understanding when looking at the complete query.
正确的编码标准当然有助于理解查询,因为它允许将其分解为视觉上更小、格式更好的块。当涉及子查询时,最好先了解它们的作用,然后在查看完整查询时使用这种理解。
回答by FerranB
Another importan is use standard join syntax:
另一个重要的是使用标准连接语法:
SELECT A
FROM B
JOIN C ON B.ID = C.ID
WHERE C.X = 1
Instead of
代替
SELECT A
FROM B
, C
WHERE B.ID = C.ID
AND C.X = 1
回答by Gene Roberts
As with anything, the BEST way is to write lots of complex SQL statements yourself. Eventually the general way things are structured becomes apparent. Of course, if you're looking for something quick that probably isn't the way.
与任何事情一样,最好的方法是自己编写大量复杂的 SQL 语句。最终,事物结构的一般方式变得显而易见。当然,如果您正在寻找快速的东西,那可能不是这样。
White space is very important. A query that looks incredibly complex can look almost simplistic when the proper white space is present.
留白非常重要。当存在适当的空白时,看起来非常复杂的查询可能看起来几乎很简单。
As to the joins... Well, I'm sorry but I can't be very helpful here, because my answer is that the best way to understand a particular join is to understand how joins work in general. Each type of join serves a very specific purpose and if you know how they work, there shouldn't really be much of a difference from joining x to y, x to y to z, or x and y to a and b.
至于连接......好吧,我很抱歉,但我不能在这里提供很大帮助,因为我的答案是理解特定连接的最佳方法是了解连接的一般工作方式。每种类型的连接都有一个非常特定的目的,如果您知道它们是如何工作的,那么与将 x 连接到 y、将 x 连接到 y 到 z 或将 x 和 y 连接到 a 和 b 之间应该没有太大区别。
What may help more immediately, however, is knowing that you need to look at the innermost pieces first. As opposed to code where you're probably used to looking at things on the grand scale then digging into the granularity, with a query it's more helpful and easy to understand if you start with the granularity and work your way outward.
然而,更直接的方法是知道您需要先查看最里面的部分。与您可能习惯于在大尺度上查看事物然后深入研究粒度的代码相反,如果您从粒度开始并向外工作,查询会更有帮助且更容易理解。
Start with any subqueries, figure out what they're doing in individual pieces treating it as a single query (if possible) then gradually move out step by step until you're at the top. Once again, on the joins... Really, just go find a web page that explains joins and do some tests until you fully understand them. There's not really a way to make that easier, as once you understand them you can pretty much figure out anything with joins that you want.
从任何子查询开始,弄清楚它们在单独的部分中做什么,将其视为单个查询(如果可能),然后逐步移出,直到您到达顶部。再一次,关于连接......真的,只要找到一个解释连接的网页并做一些测试,直到你完全理解它们。没有真正的方法可以让这变得更容易,因为一旦你理解了它们,你几乎可以找出任何你想要的连接。
回答by fractor
I find going back to the logical query processing phases and unpicking the query bit by bit with sample data is often helpful.
我发现回到逻辑查询处理阶段,用样本数据一点一点地解开查询通常很有帮助。
(The following is borrowed from Inside Microsoft SQL Server 2005: T-SQL Querying, by Itzik Ben-Gan.)
(以下内容来自 Inside Microsoft SQL Server 2005:T-SQL Querying,作者 Itzik Ben-Gan。)
(8) SELECT (9) DISTINCT (11) <TOP_specification> <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) WITH {CUBE | ROLLUP}
(7) HAVING <having_condition>
(10) ORDER BY <order_by_list>
- FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
- ON: The ON filter is applied to VT1. Only rows for which the is TRUE are inserted to VT2.
- OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
- WHERE: The WHERE filter is applied to VT3. Only rows for which the is TRUE are inserted to VT4.
- GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
- CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
- HAVING: The HAVING filter is applied to VT6. Only groups for which the is TRUE are inserted to VT7.
- SELECT: The SELECT list is processed, generating VT8.
- DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
- ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
- TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.
- FROM:在FROM子句的前两个表之间进行笛卡尔积(交叉连接),结果生成了虚拟表VT1。
- ON: ON 滤波器应用于 VT1。只有 为 TRUE 的行才会插入到 VT2。
- OUTER (join):如果指定了 OUTER JOIN(与 CROSS JOIN 或 INNER JOIN 相反),则保留表中的行或未找到匹配项的表将作为外部行添加到 VT2 的行中,生成VT3。如果在 FROM 子句中出现两个以上的表,则在最后一个连接的结果和 FROM 子句中的下一个表之间重复应用步骤 1 到 3,直到处理完所有表。
- WHERE:WHERE 过滤器应用于 VT3。只有 为 TRUE 的行才会插入到 VT4。
- GROUP BY:VT4 中的行根据 GROUP BY 子句中指定的列列表按组排列。VT5 生成。
- 立方体 | ROLLUP:超级组(组的组)被添加到来自 VT5 的行,生成 VT6。
- HAVING:HAVING 过滤器应用于 VT6。只有 为 TRUE 的组才会插入到 VT7。
- SELECT:处理 SELECT 列表,生成 VT8。
- DISTINCT:从 VT8 中删除重复的行。VT9 生成。
- ORDER BY:VT9 中的行根据 ORDER BY 子句中指定的列列表进行排序。生成游标 (VC10)。
- TOP:从 VC10 的开头选择指定的行数或百分比。生成表 VT11 并返回给调用者。