oracle 证明 SQL 查询等效性

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/56895/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 17:10:34  来源:igfitidea点击:

Proving SQL query equivalency

sqloracle

提问by Matthew Watson

How would you go about proving that two queries are functionally equivalent, eg they will always both return the same result set.

您将如何证明两个查询在功能上是等效的,例如它们将始终返回相同的结果集。



As I had a specific query in mind when I was doing this, I ended up doing as @dougman suggested, over about 10% of rows the tables concerned and comparing the results, ensuring there was no out of place results.

由于我在执行此操作时有一个特定的查询,因此我最终按照@dougman 的建议进行操作,将超过 10% 的行与表相关并比较结果,确保没有不合适的结果。

采纳答案by Doug Porter

The best you can do is compare the 2 query outputs based on a given set of inputs looking for any differences. To say that they will always return the same results for all inputs really depends on the data.

您能做的最好的事情是根据给定的一组输入比较 2 个查询输出,以寻找任何差异。说它们总是为所有输入返回相同的结果实际上取决于数据。

For Oracle one of the better if not best approaches (very efficient) is here (Ctrl+FComparing the Contents of Two Tables):
http://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html

对于 Oracle,最好的方法之一(非常有效)在这里(Ctrl+F比较两个表的内容):
http://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html

Which boils down to:

归结为:

select c1,c2,c3, 
       count(src1) CNT1, 
       count(src2) CNT2
  from (select a.*, 
               1 src1, 
               to_number(null) src2 
          from a
        union all
        select b.*, 
               to_number(null) src1, 
               2 src2 
          from b
       )
group by c1,c2,c3
having count(src1) <> count(src2);

回答by Sander van den Oord

1) Real equivalency proof with Cosette:
Cosette checks (with a proof) if 2 SQL query's are equivalent and counter examples when not equivalent. It's the only way to be absolutely sure, well almost ;) You can even throw in 2 query's on their website and check (formal) equivalence right away.

1) 与 Cosette 的真实等价证明:
Cosette 检查(使用证明)2 个 SQL 查询是否等价,反例不等价。这是绝对确定的唯一方法,几乎​​ ;) 您甚至可以在他们的网站上输入 2 个查询并立即检查(正式)等效性。

Link to Cosette: http://cosette.cs.washington.edu/

珂赛特的链接:http: //cosette.cs.washington.edu/

Link to article that gives a good explanation of how Cosette works: https://medium.com/@uwdb/introducing-cosette-527898504bd6


2) Or if you're just looking for a quick practical fix:
Try this stackoverflow answer: [sql - check if two select's are equal]
Which comes down to:

链接到对 Cosette 如何工作有很好解释的文章:https: //medium.com/@uwdb/introducing-cosette-527898504bd6


2)或者如果你只是在寻找一个快速实用的解决方法:
试试这个 stackoverflow 答案:[ sql - 检查两个选择是否相等]
归结为:

(select * from query1 MINUS select * from query2) 
UNION ALL
(select * from query2 MINUS select * from query1)

This query gives you all rows that are returned by only one of the queries.

此查询为您提供仅由其中一个查询返回的所有行。

回答by Rik

This sounds to me like a an NP complete problem. I'm not sure there is a sure fire way to prove this kind of thing

这对我来说听起来像是一个 NP 完全问题。我不确定有没有可靠的方法来证明这种事情

回答by EvilTeach

This is pretty easy to do.

这很容易做到。

Lets assume your queries are named a and b

假设您的查询名为 a 和 b

a minus b

a减b

should give you an empty set. If it does not. then the queries return different sets, and the result set shows you the rows that are different.

应该给你一个空集。如果没有。然后查询返回不同的集合,结果集显示不同的行。

then do

然后做

b minus a

b 减去 a

that should give you an empty set. If it does, then the queries do return the same sets. if it is not empty, then the queries are different in some respect, and the result set shows you the rows that are different.

那应该给你一个空集。如果是,则查询确实返回相同的集合。如果它不为空,那么查询在某些方面是不同的,结果集会显示不同的行。

回答by Mark Harrison

This will do the trick. If this query returns zero rows the two queries are returning the same results. As a bonus, it runs as a single query, so you don't have to worry about setting the isolation level so that the data doesn't change between two queries.

这将解决问题。如果此查询返回零行,则两个查询将返回相同的结果。作为奖励,它作为单个查询运行,因此您不必担心设置隔离级别,以便数据不会在两个查询之间更改。

select * from ((<query 1> MINUS <query 2>) UNION ALL (<query 2> MINUS <query 1>))

Here's a handy shell script to do this:

这是一个方便的shell脚本来做到这一点:

#!/bin/sh

CONNSTR=
echo query 1, no semicolon, eof to end:; Q1=`cat` 
echo query 2, no semicolon, eof to end:; Q2=`cat`

T="(($Q1 MINUS $Q2) UNION ALL ($Q2 MINUS $Q1));"

echo select 'count(*)' from $T | sqlplus -S -L $CONNSTR

回答by Kibbee

Perhaps you could draw (by hand) out your query and the results using Venn Diagrams, and see if they produce the same diagram. Venn diagrams are good for representing sets of data, and SQL queries work on sets of data. Drawing out a Venn Diagram might help you to visualize if 2 queries are functionally equivalent.

也许您可以使用维恩图(手工)绘制查询和结果,并查看它们是否生成相同的图表。维恩图适用于表示数据集,SQL 查询适用于数据集。如果 2 个查询在功能上等效,绘制维恩图可能会帮助您形象化。

回答by Matt Rogish

The DBMS vendors have been working on this for a very, very long time. As Rik said, it's probablyan intractable problem, but I don't think any formal analysis on the NP-completeness of the problem space has been done.

DBMS 供应商已经为此工作了很长时间。正如 Rik 所说,这可能是一个棘手的问题,但我认为尚未对问题空间的 NP 完备性进行任何正式分析。

However, your best bet is to leverage your DBMS as much as possible. All DBMS systems translate SQL into some sort of query plan. You can use this query plan, which is an abstracted version of the query, as a good starting point (the DBMS will do LOTS of optimization, flattening queries into more workable models).

但是,最好的办法是尽可能多地利用 DBMS。所有 DBMS 系统都将 SQL 转换为某种查询计划。您可以使用这个查询计划,它是查询的抽象版本,作为一个很好的起点(DBMS 将进行大量优化,将查询扁平化为更可行的模型)。

NOTE: modern DBMS use a "cost-based" analyzer which is non-deterministic across statistics updates, so the query planner, over time, may change the query plan for identical queries.

注意:现代 DBMS 使用“基于成本”的分析器,该分析器在统计信息更新中是不确定的,因此查询计划器随着时间的推移可能会更改相同查询的查询计划。

In Oracle (depending on your version), you can tell the optimizer to switch from the cost based analyzer to the deterministic rule based analyzer (this will simplify plan analysis) with a SQL hint, e.g.

在 Oracle(取决于您的版本)中,您可以使用 SQL 提示告诉优化器从基于成本的分析器切换到基于确定性规则的分析器(这将简化计划分析),例如

SELECT /*+RULE*/ FROM yourtable

The rule-based optimizer has been deprecated since 8i but it still hangs around even thru 10g (I don't know 'bout 11). However, the rule-based analyzer is much less sophisticated: the error rate potentially is much higher.

自 8i 以来,基于规则的优化器已被弃用,但即使到了 10g(我不知道'bout 11),它仍然存在。然而,基于规则的分析器要复杂得多:错误率可能要高得多。

For further reading of a more generic nature, IBM has been fairly prolific with their query-optimization patents. This one here on a method for converting SQL to an "abstract plan" is a good starting point: http://www.patentstorm.us/patents/7333981.html

为了进一步阅读更通用的内容,IBM 的查询优化专利相当多产。这里关于将 SQL 转换为“抽象计划”的方法是一个很好的起点:http: //www.patentstorm.us/patents/7333981.html

回答by Michael OShea

You don't.

你没有。

If you need a high level of confidence that a performance change, for example, hasn't changed the output of a query then test the hell out it.

例如,如果您需要对性能变化没有改变查询的输出有高度的信心,那么请测试一下。

If you need a really high level of confidence .. then errrm, test it even more.

如果您需要非常高的信心……那么 errrm,请进行更多测试。

Massive level's of testing aren't that hard to cobble together for a SQL query. Write a proc which will iterate around a large/complete set of possible paramenters, and call each query with each set of params, and write the outputs to respective tables. Compare the two tables and there you have it.

为 SQL 查询拼凑大量级别的测试并不难。编写一个 proc,它会遍历大量/完整的可能参数集,并使用每组参数调用每个查询,并将输出写入相应的表。比较这两个表,你就知道了。

It's not exactly scientific, which I guess was the OP's question, but I'm not aware of a formal method to prove equivalency.

这不完全是科学的,我猜这是 OP 的问题,但我不知道证明等效性的正式方法。

回答by tbone

CAREFUL! Functional "equivalence" is often based on the data, and you may "prove" equivalence of 2 queries by comparing results for many cases and still be wrong once the data changes in a certain way.

小心!功能上的“等价”往往是基于数据的,你可能会通过比较很多情况下的结果来“证明”2个查询的等价,但一旦数据发生某种变化,仍然是错误的

For example:

例如:

SQL> create table test_tabA
(
col1 number
)

Table created.

SQL> create table test_tabB
(
col1 number
)

Table created.

SQL> -- insert 1 row

SQL> insert into test_tabA values (1)

1 row created.

SQL> commit

Commit complete.

SQL> -- Not exists query:

SQL> select * from test_tabA a
where not exists
(select 'x' from test_tabB b
where b.col1 = a.col1)

      COL1

----------

         1

1 row selected.

SQL> -- Not IN query:

SQL> select * from test_tabA a
where col1 not in
(select col1
from test_tabB b)

      COL1

----------

         1

1 row selected.


-- THEY MUST BE THE SAME!!! (or maybe not...)


SQL> -- insert a NULL to test_tabB

SQL> insert into test_tabB values (null)

1 row created.

SQL> commit

Commit complete.

SQL> -- Not exists query:

SQL> select * from test_tabA a
where not exists
(select 'x' from test_tabB b
where b.col1 = a.col1)


      COL1

----------

         1

1 row selected.

SQL> -- Not IN query:

SQL> select * from test_tabA a
where col1 not in
(select col1
from test_tabB b)

**no rows selected.**