测试 SQL 查询的最佳方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/754527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to test SQL queries
提问by Bluephlame
I have run into a problem wherein we keep having complex SQL queries go out with errors. Essentially this results in sending mail to the incorrect customers and other 'problems' like that.
我遇到了一个问题,其中我们不断有复杂的 SQL 查询出错。从本质上讲,这会导致将邮件发送给不正确的客户和其他类似的“问题”。
What is everyone's experience with creating SQL queries like that? We are creating new cohorts of data every other week.
每个人创建这样的 SQL 查询的经验是什么?我们每隔一周就会创建新的数据群组。
So here are some of my thoughts and the limitations to them:
所以这里是我的一些想法和它们的局限性:
Creating test dataWhilst this would prove that we have all the correct data it does not enforce the exclusion of anomalies in production. That is data that would be considered wrong today but may have been correct 10 years ago; it wasn't documented and therefore we only know about it after the data is extracted.
Create Venn diagrams and data mapsThis seems to be a solid way to test the design of a query, however it doesn't guarantee that the implementation is correct. It gets the developers planning ahead and thinking of what is happening as they write.
创建测试数据虽然这将证明我们拥有所有正确的数据,但并不能强制排除生产中的异常。这些数据在今天被认为是错误的,但在 10 年前可能是正确的;它没有记录在案,因此我们只有在提取数据后才知道它。
创建维恩图和数据映射这似乎是测试查询设计的可靠方法,但它并不能保证实现是正确的。它让开发人员提前计划并在他们编写时思考正在发生的事情。
Thanks for any input you can give to my problem.
感谢您对我的问题提供的任何意见。
回答by tpdi
You wouldn't write an application with functions 200 lines long. You'd decompose those long functions into smaller functions, each with a single clearly defined responsibility.
您不会编写具有 200 行长的函数的应用程序。您可以将这些长函数分解为更小的函数,每个函数都有一个明确定义的职责。
Why write your SQL like that?
为什么要这样写你的 SQL?
Decompose your queries,just like you decompose your functions. This makes them shorter, simpler, easier to comprehend, easier to test, easier to refactor. And it allows you to add "shims" between them, and "wrappers" around them, just as you do in procedural code.
分解查询,就像分解函数一样。这使得它们更短、更简单、更容易理解、更容易测试、更容易重构。它允许您在它们之间添加“垫片”,并在它们周围添加“包装器”,就像您在程序代码中所做的那样。
How do you do this? By making each significant thing a query does into a view. Then you composemore complex queries out of these simpler views, just as you compose more complex functions out of more primitive functions.
你怎么做到这一点?通过将查询所做的每件重要事情都放入视图中。然后,您可以从这些更简单的视图中组合出更复杂的查询,就像您从更原始的函数中组合出更复杂的函数一样。
And the great thing is, for mostcompositions of views, you'll get exactly the same performance out of your RDBMS. (For some you won't; so what? Premature optimization is the root of all evil. Code correctly first, thenoptimize if you need to.)
最棒的是,对于大多数视图组合,您将从 RDBMS 中获得完全相同的性能。(对于一些你不会;那又怎样?过早的优化是万恶之源。首先正确编码,然后根据需要进行优化。)
Here's an example of using several view to decompose a complicated query.
In the example, because each view adds only one transformation, each can be independently tested to find errors, and the tests are simple.
在示例中,由于每个视图只添加了一个变换,每个视图都可以独立测试以发现错误,并且测试简单。
Here's the base table in the example:
这是示例中的基表:
create table month_value(
eid int not null, month int, year int, value int );
This table is flawed, because it uses two columns, month and year, to represent one datum, an absolute month. Here's our specification for the new, calculated column:
该表有缺陷,因为它使用两列(月份和年份)来表示一个数据,即绝对月份。这是我们对新的计算列的规范:
We'll do that as a linear transform, such that it sorts the same as (year, month), and such that for any (year, month) tuple there is one and only value, and all values are consecutive:
我们将其作为一个线性变换来进行,这样它的排序与 (year,month) 相同,并且对于任何 (year,month) 元组都有一个且唯一的值,并且所有值都是连续的:
create view cm_absolute_month as
select *, year * 12 + month as absolute_month from month_value;
Now what we have to test is inherent in our spec, namely that for any tuple (year, month), there is one and only one (absolute_month), and that (absolute_month)s are consecutive. Let's write some tests.
现在我们要测试的是我们的规范中固有的,即对于任何元组(年,月),只有一个(absolute_month),并且(absolute_month)是连续的。让我们写一些测试。
Our test will be a SQL select
query, with the following structure: a test name and a case statement catenated together. The test name is just an arbitrary string. The case statement is just case when
test statements then 'passed' else 'failed' end
.
我们的测试将是一个 SQLselect
查询,具有以下结构:连接在一起的测试名称和 case 语句。测试名称只是一个任意字符串。case 语句只是case when
测试语句then 'passed' else 'failed' end
。
The test statements will just be SQL selects (subqueries) that must be true for the test to pass.
测试语句将只是 SQL 选择(子查询),测试必须为真才能通过。
Here's our first test:
这是我们的第一个测试:
--a select statement that catenates the test name and the case statement
select concat(
-- the test name
'For every (year, month) there is one and only one (absolute_month): ',
-- the case statement
case when
-- one or more subqueries
-- in this case, an expected value and an actual value
-- that must be equal for the test to pass
( select count(distinct year, month) from month_value)
--expected value,
= ( select count(distinct absolute_month) from cm_absolute_month)
-- actual value
-- the then and else branches of the case statement
then 'passed' else 'failed' end
-- close the concat function and terminate the query
);
-- test result.
Running that query produces this result: For every (year, month) there is one and only one (absolute_month): passed
运行该查询会产生以下结果: For every (year, month) there is one and only one (absolute_month): passed
As long as there is sufficient test data in month_value, this test works.
只要month_value中有足够的测试数据,这个测试就有效。
We can add a test for sufficient test data, too:
我们也可以为足够的测试数据添加一个测试:
select concat( 'Sufficient and sufficiently varied month_value test data: ',
case when
( select count(distinct year, month) from month_value) > 10
and ( select count(distinct year) from month_value) > 3
and ... more tests
then 'passed' else 'failed' end );
Now let's test it's consecutive:
现在让我们测试它是连续的:
select concat( '(absolute_month)s are consecutive: ',
case when ( select count(*) from cm_absolute_month a join cm_absolute_month b
on ( (a.month + 1 = b.month and a.year = b.year)
or (a.month = 12 and b.month = 1 and a.year + 1 = b.year) )
where a.absolute_month + 1 <> b.absolute_month ) = 0
then 'passed' else 'failed' end );
Now let's put our tests, which are just queries, into a file, and run the that script against the database. Indeed, if we store our view definitions in a script (or scripts, I recommend one file per related views) to be run against the database, we can add our tests for each view to the samescript, so that the act of (re-) creating our view also runs the view's tests. That way, we both get regression tests when we re-create views, and, when the view creation runs against production, the view will will also be tested in production.
现在让我们将我们的测试(只是查询)放入一个文件中,然后针对数据库运行该脚本。事实上,如果我们将视图定义存储在一个脚本(或脚本,我建议每个相关视图一个文件)中以针对数据库运行,我们可以将每个视图的测试添加到同一个脚本中,以便(重新-) 创建我们的视图也会运行视图的测试。这样,当我们重新创建视图时,我们都会得到回归测试,并且当视图创建针对生产运行时,视图也将在生产中进行测试。
回答by ojblass
Create a test system database that you can reload as often as you wish. Load your data or create your data and save it off. Produce an easy way to reload it. Attach your development system to that database and validate your code before you go to production. Kick yourself everytime you manage to let an issue get into production. Create a suite of tests to verify known issues and grow your test suite over time.
创建一个测试系统数据库,您可以根据需要随时重新加载该数据库。加载您的数据或创建您的数据并将其保存。生成一个简单的方法来重新加载它。将您的开发系统附加到该数据库并在投入生产之前验证您的代码。每次你设法让一个问题进入生产时都踢自己。创建一套测试来验证已知问题并随着时间的推移增加您的测试套件。
回答by Jon Limjap
You might want to check DbUnit, so you may try writing unit tests for your programs with a fixed set of data. That way you should be able to write queries with more or less predictable results.
您可能想要检查DbUnit,因此您可以尝试使用一组固定数据为您的程序编写单元测试。这样,您应该能够编写具有或多或少可预测结果的查询。
The other thing you might want to do is profile your SQL Server execution stack and find out if all the queries are indeed the correct ones, e.g., if you are using just one query which returns both correct and incorrect results, then clearly the query being used is in question, but what about if your application is sending out different queries at different points in the code?
您可能想要做的另一件事是分析您的 SQL Server 执行堆栈并找出所有查询是否确实是正确的,例如,如果您只使用一个返回正确和不正确结果的查询,那么显然查询是used 是有问题的,但是如果您的应用程序在代码的不同点发送不同的查询呢?
Any attempt to fix your query then would be futile... the rogue queries might still be the ones firing up the wrong results anyway.
任何修复查询的尝试都是徒劳的......无论如何,流氓查询可能仍然是那些触发错误结果的查询。
回答by Mars the Infomage
Re: tpdi
回复:tpdi
case when ( select count(*) from cm_abs_month a join cm_abs_month b
on (( a.m + 1 = b.m and a.y = b.y) or (a.m = 12 and b.m = 1 and a.y + 1 = b.y) )
where a.am + 1 <> b.am ) = 0
Note that this only checks that am values for consecutive months will be consecutive, not that consecutive data exists (which is probably what you intended initially). This will always pass if none of your source data is consecutive (e.g. you only have even-numbered months), even if your am calculation is totally off.
请注意,这仅检查连续几个月的 am 值是否是连续的,而不是连续数据存在(这可能是您最初打算的)。如果您的源数据都不是连续的(例如,您只有偶数个月),即使您的 am 计算完全关闭,这也将始终通过。
Also am I missing something, or does the second half of that ON clause bump the wrong month value? (i.e. checks that 12/2011 comes after 1/2010)
我是否也遗漏了什么,或者该 ON 子句的后半部分是否超出了错误的月份值?(即检查 12/2011 是否在 1/2010 之后)
What's worse, if I remember correctly, SQL Server at least allows you less than 10 levels of views before the optimizer throws its virtual hands into the air and starts doing full table scans on every request, so don't over-do this approach.
更糟糕的是,如果我没记错的话,在优化器将其虚拟手抛向空中并开始对每个请求进行全表扫描之前,SQL Server 至少允许您拥有少于 10 级的视图,所以不要过度使用这种方法。
Remember to test the heck out of your test cases!
记住要测试你的测试用例!
Otherwise creating a very wide set of data to encompass most or all possible forms of inputs, using SqlUnit or DbUnit or any other *Unit to automate checking for expected results against that data, and reviewing, maintaining and updating it as necessarygenerally seems to be the way to go.
否则创建一个非常广泛的数据集以包含大多数或所有可能的输入形式,使用 SqlUnit 或 DbUnit 或任何其他 *Unit 来自动检查针对该数据的预期结果,并在必要时、维护和更新它通常似乎是要走的路。