优化在 Oracle 上运行缓慢而在 SQL Server 上运行很快的 SELECT 查询
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/120504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimising a SELECT query that runs slow on Oracle which runs quickly on SQL Server
提问by RickL
I'm trying to run the following SQL statement in Oracle, and it takes ages to run:
我正在尝试在 Oracle 中运行以下 SQL 语句,它需要很长时间才能运行:
SELECT orderID FROM tasks WHERE orderID NOT IN
(SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL)
If I run just the sub-part that is in the IN clause, that runs very quickly in Oracle, i.e.
如果我只运行 IN 子句中的子部分,它在 Oracle 中运行得非常快,即
SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL
Why does the whole statement take such a long time in Oracle? In SQL Server the whole statement runs quickly.
为什么整个语句在 Oracle 中需要这么长时间?在 SQL Server 中,整个语句运行得很快。
Alternatively is there a simpler/different/better SQL statement I should use?
或者,我应该使用更简单/不同/更好的 SQL 语句吗?
Some more details about the problem:
有关该问题的更多详细信息:
- Each order is made of many tasks
- Each order will be allocated (one or more of its task will have engineer1 and engineer2 set) or the order can be unallocated (all its task have null values for the engineer fields)
- I am trying to find all the orderIDs that are unallocated.
- 每个订单由许多任务组成
- 每个订单将被分配(一个或多个任务将设置工程师 1 和工程师 2)或订单可以未分配(其所有任务的工程师字段值为空)
- 我正在尝试查找所有未分配的 orderID。
Just in case it makes any difference, there are ~120k rows in the table, and 3 tasks per order, so ~40k different orders.
以防万一,表中有 ~120k 行,每个订单有 3 个任务,所以 ~40k 不同订单。
Responses to answers:
对答案的回应:
- I would prefer a SQL statement that works in both SQL Server and Oracle.
- The tasks only has an index on the orderID and taskID.
- I tried the NOT EXISTS version of the statement but it ran for over 3 minutes before I cancelled it. Perhaps need a JOIN version of the statement?
- There is an "orders" table as well with the orderID column. But I was trying to simplify the question by not including it in the original SQL statement.
- 我更喜欢在 SQL Server 和 Oracle 中都可以使用的 SQL 语句。
- 任务只有 orderID 和 taskID 上的索引。
- 我尝试了语句的 NOT EXISTS 版本,但在我取消之前它运行了 3 多分钟。也许需要一个 JOIN 版本的语句?
- 还有一个带有 orderID 列的“订单”表。但是我试图通过不将其包含在原始 SQL 语句中来简化问题。
I guess that in the original SQL statement the sub-query is run every time for each row in the first part of the SQL statement - even though it is static and should only need to be run once?
我猜在原始 SQL 语句中,子查询每次都会为 SQL 语句的第一部分中的每一行运行 - 即使它是静态的并且只需要运行一次?
Executing
执行
ANALYZE TABLE tasks COMPUTE STATISTICS;
made my original SQL statement execute much faster.
使我原来的 SQL 语句执行得更快。
Although I'm still curious why I have to do this, and if/when I would need to run it again?
尽管我仍然很好奇为什么我必须这样做,以及是否/何时需要再次运行它?
The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.
In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.
Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.
这些统计信息为 Oracle 基于成本的优化器提供了确定不同执行计划效率所需的信息:例如,表中的行数、行的平均宽度、每列的最高和最低值、每列的不同值的数量,索引的聚类因子等。
在一个小型数据库中,您只需设置一项工作来每晚收集统计数据,然后就不管了。其实这是10g以下的默认值。对于较大的实现,您通常必须权衡执行计划的稳定性与数据变化的方式,这是一个棘手的平衡。
Oracle 还有一个称为“动态采样”的特性,用于对表进行采样以确定执行时的相关统计信息。它更常用于数据仓库,其中采样的开销超过了长时间运行查询的潜在性能提升。
采纳答案by hamishmcn
Often this type of problem goes away if you analyze the tables involved (so Oracle has a better idea of the distribution of the data)
如果您分析所涉及的表,这种类型的问题通常会消失(因此 Oracle 对数据的分布有更好的了解)
ANALYZE TABLE tasks COMPUTE STATISTICS;
回答by Georgi
The "IN" - clause is known in Oracle to be pretty slow. In fact, the internal query optimizer in Oracle cannot handle statements with "IN" pretty good. try using "EXISTS":
“IN” - 子句在 Oracle 中非常慢。事实上,Oracle 内部的查询优化器不能很好地处理带有“IN”的语句。尝试使用“存在”:
SELECT orderID FROM tasks WHERE orderID NOT EXISTS
(SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL)`print("code sample");`
Caution: Please check if the query builds the same data results.
注意:请检查查询是否构建了相同的数据结果。
Edith says: ooops, the query is not well formed, but the general idea is correct. Oracle has to fulfill a full table scan for the second (inner) query, build the results and then compare them to the first (outer) query, that's why it's slowing down. Try
伊迪丝说:哎呀,查询的格式不正确,但总体思路是正确的。Oracle 必须为第二个(内部)查询执行全表扫描,构建结果,然后将它们与第一个(外部)查询进行比较,这就是它变慢的原因。尝试
SELECT orderID AS oid FROM tasks WHERE NOT EXISTS
(SELECT DISTINCT orderID AS oid2 FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL and oid=oid2)
or something similiar ;-)
或类似的东西;-)
回答by kristof
I would try using joins instead
我会尝试使用连接代替
SELECT
t.orderID
FROM
tasks t
LEFT JOIN tasks t1
ON t.orderID = t1.orderID
AND t1.engineer1 IS NOT NULL
AND t1.engineer2 IS NOT NULL
WHERE
t1.orderID IS NULL
also your original query would probably be easier to understand if it was specified as:
如果将原始查询指定为:
SELECT orderID FROM orders WHERE orderID NOT IN
(SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL)
(assuming you have orders table with all the orders listed)
(假设您有列出所有订单的订单表)
which can be then rewritten using joins as:
然后可以使用连接重写为:
SELECT
o.orderID
FROM
orders o
LEFT JOIN tasks t
ON o.orderID = t.orderID
AND t.engineer1 IS NOT NULL
AND t.engineer2 IS NOT NULL
WHERE
t.orderID IS NULL
回答by Tony Andrews
Some questions:
一些问题:
- How many rows are there in tasks?
- What indexes are defined on it?
- Has the table been analyzed recently?
- 任务中有多少行?
- 上面定义了哪些索引?
- 最近有分析过表吗?
Another way to write the same query would be:
编写相同查询的另一种方法是:
select orderid from tasks
minus
select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL
However, I would rather expect the query to involve an "orders" table:
但是,我宁愿希望查询涉及“订单”表:
select orderid from ORDERS
minus
select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL
or
或者
select orderid from ORDERS
where orderid not in
( select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL
)
or
或者
select orderid from ORDERS
where not exists
( select null from tasks
where tasks.orderid = orders.orderid
and engineer1 IS NOT NULL OR engineer2 IS NOT NULL
)
回答by Ethan Post
I agree with TZQTZIO, I don't get your query.
我同意 TZQTZIO,我不明白你的问题。
If we assume the query did make sense then you might want to try using EXISTS as some suggest and avoid IN. IN is not always bad and there are likely cases which one could show it actually performs better than EXISTS.
如果我们假设查询确实有意义,那么您可能想尝试使用 EXISTS 作为一些建议并避免 IN。IN 并不总是坏的,并且在某些情况下可以证明它实际上比 EXISTS 表现得更好。
The question title is not very helpful. I could set this query up in one Oracle database and make it run slow and make it run fast in another. There are many factors that determine how the database resolves the query, object statistics, SYS schema statistics, and parameters, as well as server performance. Sqlserver vs. Oracle isn't the problem here.
问题标题不是很有帮助。我可以在一个 Oracle 数据库中设置这个查询,让它运行得慢一些,而在另一个数据库中让它运行得很快。决定数据库如何解析查询、对象统计、SYS 模式统计和参数以及服务器性能的因素有很多。Sqlserver 与 Oracle 不是这里的问题。
For those interested in query tuning and performance and want to learn more some of the google terms to search are "oak table oracle" and "oracle jonathan lewis".
对于那些对查询调优和性能感兴趣并希望了解更多信息的人来说,一些要搜索的 google 术语是“oak table oracle”和“oracle jonathan lewis”。
回答by David Aldridge
"Although I'm still curious why I have to do this, and if/when I would need to run it again?"
“虽然我仍然很好奇为什么我必须这样做,以及我是否/何时需要再次运行它?”
The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.
这些统计信息为 Oracle 基于成本的优化器提供了确定不同执行计划效率所需的信息:例如,表中的行数、行的平均宽度、每列的最高和最低值、每列的不同值的数量,索引的聚类因子等。
In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.
在一个小型数据库中,您只需设置一项工作来每晚收集统计数据,然后就不管了。其实这是10g以下的默认值。对于较大的实现,您通常必须权衡执行计划的稳定性与数据变化的方式,这是一个棘手的平衡。
Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.
Oracle 还有一个称为“动态采样”的特性,用于对表进行采样以确定执行时的相关统计信息。它更常用于数据仓库,其中采样的开销超过了长时间运行查询的潜在性能提升。
回答by AJ.
I think several people have pretty much the right SQL, but are missing a join between the inner and outer queries.
Try this:
我认为有些人拥有几乎正确的 SQL,但缺少内部和外部查询之间的连接。
尝试这个:
SELECT t1.orderID
FROM tasks t1
WHERE NOT EXISTS
(SELECT 1
FROM tasks t2
WHERE t2.orderID = t1.orderID
AND t2.engineer1 IS NOT NULL
AND t2.engineer2 IS NOT NULL)
回答by Leigh Riffel
I agree with ΤΖΩΤΖΙΟΥ and wearejimbo that your query should be...
我同意 ΤΖΩΤΖΙΟΥ 和wearejimbo,您的查询应该是...
SELECT DISTINCT orderID FROM Tasks
WHERE Engineer1 IS NULL OR Engineer2 IS NULL;
I don't know about SQL Server, but this query won't be able to take advantage of any indexes because null rows aren't in indexes. The solution to this would be to re-write the query in a way that would allow a function based index to be created that only includes the null value rows. This could be done with NVL2, but would likely not be portable to SQL Server.
我不了解 SQL Server,但此查询将无法利用任何索引,因为索引中没有空行。对此的解决方案是以允许创建仅包含空值行的基于函数的索引的方式重写查询。这可以通过 NVL2 完成,但可能无法移植到 SQL Server。
I think the best answer is not one that meets your criteria and that is write a different statement for each platform that is best for that platform.
我认为最好的答案不是满足您的标准,而是为每个平台写下最适合该平台的不同声明。
回答by Leigh Riffel
If you decide to create an ORDERS table, I'd add an ALLOCATED flag to it, and create a bitmap index. This approach also forces you to modify the business logic to keep the flag updated, but the queries will be lightning fast. It depends on how critical are the queries for the application.
如果您决定创建一个 ORDERS 表,我会向它添加一个 ALLOCATED 标志,并创建一个位图索引。这种方法还迫使您修改业务逻辑以保持标志更新,但查询速度会快如闪电。这取决于应用程序的查询有多重要。
Regarding the answers, the simpler the better in this case. Forget subqueries, joins, distinct and group bys, they are not needed at all!
关于答案,在这种情况下越简单越好。忘记子查询、连接、distinct 和 group by,它们根本不需要!
回答by David Aldridge
What proportion of the rows in the table meet the condition "engineer1 IS NOT NULL AND engineer2 IS NOT NULL"?
表中满足条件“engineer1 IS NOT NULL AND Engineer2 IS NOT NULL”的行比例是多少?
This tells you (roughly) whether it might be worth trying to use an index to retrieve the associated orderid's.
这会告诉您(大致)是否值得尝试使用索引来检索关联的 orderid。
Another way to write the query in Oracle that would handle unindexed cases very well would be:
在 Oracle 中编写查询以很好地处理未索引情况的另一种方法是:
select distinct orderid
from
(
select orderid,
max(case when engineer1 is null and engineer2 is null then 0 else 1)
over (partition by orderid)
as max_null_finder
from tasks
)
where max_null_finder = 0