oracle 仅使用通配符 (%) 作为值的 SQL LIKE 性能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1604707/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL LIKE Performance with only the wildcard (%) as a value
提问by Chris Dail
I am wondering what the performance of a query would be like using the LIKE keyword and the wildcard as the value compared to having no where clause at all.
我想知道与根本没有 where 子句相比,使用 LIKE 关键字和通配符作为值的查询性能如何。
Consider a where clause such as "WHERE a LIKE '%'". This will match all possible values of the column 'a'. How does this compare to not having the where clause at all.
考虑一个 where 子句,例如“WHERE a LIKE '%'”。这将匹配“a”列的所有可能值。这与根本没有 where 子句相比如何。
The reason I ask this is that I have an application where there are some fields that the user may specify values to search on. In some cases the user would like all the possible results. I am currently using a single query like this:
我问这个的原因是我有一个应用程序,其中有一些用户可以指定要搜索的值的字段。在某些情况下,用户想要所有可能的结果。我目前正在使用这样的单个查询:
SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?
The values of '%' and '%' can be supplied to match all possible values for a and or b. This is convenient since I can use a single named query in my application for this. I wonder what the performance considerations are for this. Does the query optimizer reduce LIKE '%' to simply match all? I realize that because I'm using a named query (prepared statement), that may also affect the answer. I realize the answer is likely database specific. So specifically how would this work in Oracle, MS SQL Server and Derby.
可以提供 '%' 和 '%' 的值以匹配 a 和或 b 的所有可能值。这很方便,因为我可以在我的应用程序中使用单个命名查询。我想知道对此有何性能考虑。查询优化器是否减少 LIKE '%' 以简单地匹配所有内容?我意识到因为我使用的是命名查询(准备好的语句),这也可能会影响答案。我意识到答案可能是特定于数据库的。那么具体如何在 Oracle、MS SQL Server 和 Derby 中工作。
The alternate approach to this would be to use 3 separate queries based on the user inputting the wildcard.
另一种方法是根据用户输入的通配符使用 3 个单独的查询。
A is wildcard query:
A 是通配符查询:
SELECT * FROM TableName WHERE b LIKE ?
B is wildcard query:
B 是通配符查询:
SELECT * FROM TableName WHERE a LIKE ?
A and B are wildcards:
A 和 B 是通配符:
SELECT * FROM TableName
No wildcards:
没有通配符:
SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?
Obviously having a single query is the simplest and easiest to maintain. I would rather use just the one query if performance will still be good.
显然,只有一个查询是最简单和最容易维护的。如果性能仍然很好,我宁愿只使用一个查询。
采纳答案by Chris Dail
I was hoping there would be a textbook answer to this but it sounds like it will largely vary with different database types. Most of the responses indicated that I should run a test so that is exactly what I did.
我希望会有一个教科书式的答案,但听起来它会因不同的数据库类型而有很大差异。大多数回复表明我应该进行测试,这正是我所做的。
My application primarily targets the Derby, MS SQL and Oracle databases. Since derby can be run embedded and is easy to set up, I tested the performance on that first. The results were surprising. I tested the worst case scenario against a fairly large table. I ran the test 1000 times and averaged the results.
我的应用程序主要针对 Derby、MS SQL 和 Oracle 数据库。由于 derby 可以嵌入运行并且易于设置,因此我首先测试了它的性能。结果令人惊讶。我在一张相当大的桌子上测试了最坏的情况。我运行了 1000 次测试并对结果求平均值。
Query 1:
查询 1:
SELECT * FROM TableName
Query 2 (With values of a="%" and b="%"):
查询 2(a="%" 和 b="%" 的值):
SELECT * FROM TableName WHERE a LIKE ? AND b LIKE ?
Query 1 average time: 178ms
查询1平均时间:178ms
Query 2 average time: 181ms
查询2平均时间:181ms
So performance on derby is almost the same between the two queries.
因此,两个查询在 derby 上的性能几乎相同。
回答by Rob Farley
SQL Server will generally see
SQL Server 一般会看到
WHERE City LIKE 'A%'
and treat it as
并将其视为
WHERE City >= 'A' AND City < 'B'
...and happily use an index seek if appropriate. I say 'generally', because I've seen it fail to do this simplification in certain cases.
...如果合适,可以愉快地使用索引查找。我说“一般”,因为我已经看到它在某些情况下无法进行这种简化。
If someone's trying to do:
如果有人试图这样做:
WHERE City LIKE '%ville'
...then an index seek will be essentially impossible.
...那么索引查找基本上是不可能的。
But something as simple as:
但事情很简单:
WHERE City LIKE '%'
will be considered equivalent to:
将被视为等同于:
WHERE City IS NOT NULL
回答by outis
You can use whatever query analysis the DBMS offers (e.g. EXPLAIN
for MySQL, SET SHOWPLAN_ALL ON
for MS SQL (or use one of the other methods), EXPLAIN PLAN FOR
for Oracle) to see how the query will be executed.
您可以使用 DBMS 提供的任何查询分析(例如,EXPLAIN
对于 MySQL、SET SHOWPLAN_ALL ON
对于 MS SQL(或使用其他方法之一)、EXPLAIN PLAN FOR
对于 Oracle)来查看查询将如何执行。
回答by paxdiablo
Any DBMS worth its salt would strip out LIKE '%'
clauses before even trying to run the query. I'm fairly certain I've seen DB2/z do this in its execution plans.
任何值得一试的 DBMS 都会LIKE '%'
在尝试运行查询之前删除子句。我相当肯定我已经看到 DB2/z 在其执行计划中这样做了。
The prepared statement shouldn't make a difference since it should be turned into realSQL before it gets to the execution engine.
准备好的语句不应该有什么不同,因为它应该在到达执行引擎之前变成真正的SQL。
But, as with all optimization questions, measure, don't guess! DBAs exist because they constantly tune the DBMS based on actual data (which changes over time). At a bare minimum, you should time (and get the execution plans) for all variations with suitable static data to see if there's a difference.
但是,与所有优化问题一样,衡量而不是猜测!DBA 的存在是因为他们不断根据实际数据(随时间变化)调整 DBMS。至少,您应该为所有具有合适静态数据的变体计时(并获取执行计划),以查看是否存在差异。
I know that queries like:
我知道这样的查询:
select c from t where ((1 = 1) or (c = ?))
areoptimized to remove the entire where clause before execution (on DB2 anyway and, before you ask, the construct is useful where you need to remove the effect of the where clause but still maintain the parameter placeholder (using BIRT with Javascript to modify the queries for wildcards)).
在优化where子句执行之前(在DB2反正删除整个和,你问之前,你需要删除where子句的影响,但仍保持在参数占位符(使用BIRT的JavaScript修改查询的结构是有益的对于通配符))。
回答by Bryan Pendleton
Derby also offers tools for examining the actual query plan that was used, so you can run experiments using Derby and look at the query plan that Derby chose. You can run Derby with -Dderby.language.logQueryPlan=true, and Derby will write the query plan to derby.log, or you can use the RUNTIMESTATISTICS facility, as described here: http://db.apache.org/derby/docs/10.5/tuning/ctundepth853133.html
Derby 还提供用于检查所使用的实际查询计划的工具,因此您可以使用 Derby 运行实验并查看 Derby 选择的查询计划。您可以使用 -Dderby.language.logQueryPlan=true 运行 Derby,Derby 会将查询计划写入 derby.log,或者您可以使用 RUNTIMESTATISTICS 工具,如下所述:http://db.apache.org/derby/ docs/10.5/tuning/ctundepth853133.html
I'm not sure if Derby will strip out the A LIKE '%' ahead of time, but I also don't think that the presence of that clause will introduce much of a slowdown in the execution speed.
我不确定 Derby 是否会提前去掉 A LIKE '%',但我也不认为该子句的存在会大大降低执行速度。
I'd be quite interested to see the actual query plan output that you get in your environment, with and without the A LIKE '%' clause in place.
我很想看看你在你的环境中得到的实际查询计划输出,有和没有 A LIKE '%' 子句。
回答by David Aldridge
Oracle 10gR2 does not appear to perform a special optimisation for this situation, but it does recognise that LIKE '%' excludes nulls.
Oracle 10gR2 似乎没有针对这种情况执行特殊优化,但它确实认识到 LIKE '%' 排除空值。
create table like_test (col1)
as select cast(dbms_random.string('U',10) as varchar2(10))
from dual
connect by level <= 1000
/
insert into like_test values (null)
/
commit
/
exec dbms_stats.gather_table_stats(user,'like_test')
explain plan for
select count(*)
from like_test
/
select plan_table_output from table(dbms_xplan.display)
/
explain plan for
select count(*)
from like_test
where col1 like '%'
/
select plan_table_output from table(dbms_xplan.display)
/
explain plan for
select count(*)
from like_test
where col1 is not null
/
select plan_table_output from table(dbms_xplan.display)
/
... giving ...
……给……
Plan hash value: 3733279756
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| LIKE_TEST | 1001 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------
... and ...
... 和 ...
Plan hash value: 3733279756
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| LIKE_TEST | 1000 | 10000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL1" LIKE '%')
... and ...
... 和 ...
Plan hash value: 3733279756
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | TABLE ACCESS FULL| LIKE_TEST | 1000 | 10000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("COL1" IS NOT NULL)
Note the cardinality (rows) on the TABLE ACCESS FULL line
注意 TABLE ACCESS FULL 行上的基数(行)
回答by Thorsten
One aspect that I think is missing from the discussion is the fact that the OP wants to use a prepared statement. At the time the statement is prepared, the database/optimizer will not be able to work out the simplifications others have mentioned and so won't be able to optimize away the a like '%'
as the actual value will not be known at prepare time.
我认为讨论中缺少的一个方面是 OP 想要使用准备好的语句。在准备语句时,数据库/优化器将无法计算其他人提到的简化,因此将无法优化掉,a like '%'
因为在准备时不知道实际值。
Therefore:
所以:
- when using prepared statements, have four different statements available (0, only a, only b, both) and use the appropriate one when needed
- see if you get better performance when you don't use a prepared statement when sticking to just one statement (although then it would be pretty easy to not include 'empty' conditions)
- 使用准备好的语句时,有四种不同的可用语句(0、仅 a、仅 b、两者)并在需要时使用适当的语句
- 看看在坚持一个语句时不使用准备好的语句是否会获得更好的性能(尽管这样很容易不包含“空”条件)
回答by Paul Sasik
Depending on how the LIKE predicate is structured and on the field you're testing on, you might need a full table scan. Semantically a '%' might imply a full table scan but Sql Server does all sorts of optimization internally on queries. So the question becomes: Does Sql Server optimize on a LIKE predicate formed with'%' and throws it out of the WHERE clause?
根据 LIKE 谓词的结构方式以及您正在测试的字段,您可能需要进行全表扫描。从语义上讲,“%”可能意味着全表扫描,但 Sql Server 会在内部对查询进行各种优化。所以问题就变成了:Sql Server 是否对由 '%' 形成的 LIKE 谓词进行优化并将其从 WHERE 子句中抛出?
回答by jqa
What if a column has a non-null blank value? Your query will probably match it.
如果列具有非空的空白值怎么办?您的查询可能会匹配它。
If this is a query for a real world application then try using the free text indexing features of most modern sql databases. The performance issues will become insignificant.
如果这是对现实世界应用程序的查询,请尝试使用大多数现代 sql 数据库的自由文本索引功能。性能问题将变得无关紧要。
A simple if statement of if (A B) search a b else (A) search a else B search b else tell user they didn't specify anything
if (AB) search ab else (A) search a else B search b else 的简单 if 语句告诉用户他们没有指定任何内容
is trivial to maintain and becomes much easier to understand instead of making assumptions about the LIKE operator. You are probably going to do that in the UI anyway when you display the results "Your search for A found x" or "Your search for A B found..."
维护起来很简单,并且变得更容易理解,而不是对 LIKE 运算符进行假设。无论如何,当您显示结果“您对 A 的搜索找到 x”或“您对 AB 的搜索找到了...”时,您可能会在 UI 中执行此操作。
回答by Larry Lustig
I'm not sure of the value of using a prepared statement with the kind of parameters you're describing. The reason is that you might fool the query optimizer into preparing an execution plan that would be completely wrong depending on which of the parameters were '%'.
我不确定将准备好的语句与您所描述的参数类型一起使用的价值。原因是您可能会欺骗查询优化器准备一个完全错误的执行计划,具体取决于哪些参数是“%”。
For instance, if the statement were prepared with an execution plan using the index on column A, but the parameter for column A turned out to be '%' you may experience poor performance.
例如,如果语句是使用 A 列上的索引准备的执行计划,但 A 列的参数结果是“%”,您可能会遇到性能不佳的情况。