优化 Oracle 之间的日期语句
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13494596/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimize Oracle Between Date Statement
提问by s1lence
I got an oracle SQL query that selects entries of the current day like so:
我得到了一个 oracle SQL 查询,它选择当天的条目,如下所示:
SELECT [fields]
FROM MY_TABLE T
WHERE T.EVT_END BETWEEN TRUNC(SYSDATE)
AND TRUNC(SYSDATE) + 86399/86400
AND T.TYPE = 123
Whereas the EVT_END
field is of type DATE
and T.TYPE
is a NUMBER(15,0)
.
而EVT_END
场型的DATE
和T.TYPE
是NUMBER(15,0)
。
Im sure with increasing size of the table data (and ongoing time), the date constraint will decrease the result set by a much larger factor than the type constraint. (Since there are a very limited number of types)
我确信随着表数据大小(和持续时间)的增加,日期约束将比类型约束减少一个更大的因素的结果集。(因为种类非常有限)
So the basic question arising is, what's the best index to choose to make the selection on the current date faster. I especially wonder what the advantages and disadvantages of a functional index on TRUNC(T.EVT_END)
to a normal index on T.EVT_END
would be. When using a functional index the query would look something like that:
所以出现的基本问题是,选择什么索引可以使当前日期的选择更快。我特别想知道功能索引TRUNC(T.EVT_END)
与普通索引的优缺点是什么T.EVT_END
。使用功能索引时,查询将如下所示:
SELECT [fields]
FROM MY_TABLE T
WHERE TRUNC(T.EVT_END) = TRUNC(SYSDATE)
AND T.TYPE = 123
Because other queries use the mentioned date constraints without the additional type selection (or maybe with some other fields), multicolumn indexes wouldn't help me a lot.
因为其他查询使用提到的日期约束而没有额外的类型选择(或者可能使用其他一些字段),所以多列索引对我没有多大帮助。
Thanks, I'd appreciate your hints.
谢谢,我很感激你的提示。
采纳答案by Jon Heller
Results
结果
If your index is cached, a function-based index performs best. If your index is not cached, a compressed function-based index performs best.
如果您的索引已缓存,则基于函数的索引性能最佳。如果您的索引未缓存,则基于压缩函数的索引性能最佳。
Below are the relative times generated by my test code. Lower is better. You cannot compare the numbers between cached and non-cached, they are totally different tests.
以下是我的测试代码生成的相对时间。越低越好。您无法比较缓存和非缓存之间的数字,它们是完全不同的测试。
In cache Not in cache
Regular 120 139
FBI 100 138
Compressed FBI 126 100
I'm not sure why the FBI performs better than the regular index. (Although it's probably related to what you said about equality predicates versus range. You can see that the regular index has an extra "FILTER" step in its explain plan.) The compressed FBI has some additional overhead to uncompress the blocks. This small amount of extra CPU time is relevant when everything is already in memory, and CPU waits are most important. But when nothing is cached, and IO is more important, the reduced space of the compressed FBI helps a lot.
我不确定为什么 FBI 的表现比常规指数好。(虽然这可能与您所说的相等谓词与范围有关。您可以看到常规索引在其解释计划中有一个额外的“过滤器”步骤。)压缩的 FBI 有一些额外的开销来解压缩块。当一切都已经在内存中时,这少量的额外 CPU 时间是相关的,并且 CPU 等待是最重要的。但是当什么都不缓存,并且IO更重要时,压缩后的FBI减少的空间有很大帮助。
Assumptions
假设
There seems to be a lot of confusion about this question. The way I read it, you only care about this one specific query, and you want to know whether a function-based index or a regular index will be faster.
这个问题似乎有很多困惑。我读它的方式,你只关心这个特定的查询,你想知道基于函数的索引还是常规索引会更快。
I assume you do not care about other queries that may benefit from this index, additional time spent to maintain the index, if the developers remember to use it, or whether or not the optimizer chooses the index. (If the optimizer doesn't choose the index, which I think is unlikely, you can add a hint.) Let me know if any of these assumptions are wrong.
我假设您不关心可能从此索引受益的其他查询、维护索引所花费的额外时间、开发人员是否记得使用它、或者优化器是否选择索引。(如果优化器没有选择索引,我认为这不太可能,您可以添加一个提示。)如果这些假设中的任何一个是错误的,请告诉我。
Code
代码
--Create tables. 1 = regular, 2 = FBI, 3 = Compressed FBI
create table my_table1(evt_end date, type number) nologging;
create table my_table2(evt_end date, type number) nologging;
create table my_table3(evt_end date, type number) nologging;
--Create 1K days, each with 100K values
begin
for i in 1 .. 1000 loop
insert /*+ append */ into my_table1
select sysdate + i - 500 + (level * interval '1' second), 1
from dual connect by level <= 100000;
commit;
end loop;
end;
/
insert /*+ append */ into my_table2 select * from my_table1;
insert /*+ append */ into my_table3 select * from my_table1;
--Create indexes
create index my_table1_idx on my_table1(evt_end);
create index my_table2_idx on my_table2(trunc(evt_end));
create index my_table3_idx on my_table3(trunc(evt_end)) compress;
--Gather statistics
begin
dbms_stats.gather_table_stats(user, 'MY_TABLE1');
dbms_stats.gather_table_stats(user, 'MY_TABLE2');
dbms_stats.gather_table_stats(user, 'MY_TABLE3');
end;
/
--Get the segment size.
--This shows the main advantage of a compressed FBI, the lower space.
select segment_name, bytes/1024/1024/1024 GB
from dba_segments
where segment_name like 'MY_TABLE__IDX'
order by segment_name;
SEGMENT_NAME GB
MY_TABLE1_IDX 2.0595703125
MY_TABLE2_IDX 2.0478515625
MY_TABLE3_IDX 1.1923828125
--Test block.
--Uncomment different lines to generate 6 different test cases.
--Regular, Function-based, and Function-based compressed. Both cached and not-cached.
declare
v_count number;
v_start_time number;
v_total_time number := 0;
begin
--Uncomment two lines to test the server when it's "cold", and nothing is cached.
for i in 1 .. 10 loop
execute immediate 'alter system flush buffer_cache';
--Uncomment one line to test the server when it's "hot", and everything is cached.
--for i in 1 .. 1000 loop
v_start_time := dbms_utility.get_time;
SELECT COUNT(*)
INTO V_COUNT
--#1: Regular
FROM MY_TABLE1 T
WHERE T.EVT_END BETWEEN TRUNC(SYSDATE) AND TRUNC(SYSDATE) + 86399/86400;
--#2: Function-based
--FROM MY_TABLE2 T
--WHERE TRUNC(T.EVT_END) = TRUNC(SYSDATE);
--#3: Compressed function-based
--FROM MY_TABLE3 T
--WHERE TRUNC(T.EVT_END) = TRUNC(SYSDATE);
v_total_time := v_total_time + (dbms_utility.get_time - v_start_time);
end loop;
dbms_output.put_line('Seconds: '||v_total_time/100);
end;
/
Test Methodology
测试方法
I ran each block at least 5 times, alternated between run types (in case something was running on my machine only part of the time), threw out the high and the low run times, and averaged them. The code above does not include all that logic, since it would take up 90% of this answer.
我运行每个块至少 5 次,在运行类型之间交替(以防某些东西只在我的机器上运行部分时间),排除高运行时间和低运行时间,并对它们求平均值。上面的代码不包括所有这些逻辑,因为它会占这个答案的 90%。
Other Things to Consider
其他需要考虑的事情
There are still many other things to consider. My code assumes the data is inserted in a very index-friendly order. Things will be totally different if this is not true, as compression may not help at all.
还有很多其他的事情需要考虑。我的代码假定数据以非常适合索引的顺序插入。如果这不是真的,事情将完全不同,因为压缩可能根本没有帮助。
Probably the best solution to this problem is to avoid it completely with partitioning. For reading the same amount of data, a full table scan is much faster than an index read because it uses multi-block IO. But there are some downsides to partitioning, like the large amount of money required to buy the option, and extra maintenance tasks. For example, creating partitions ahead of time, or using interval partitioning (which has some other weird issues), gathering stats, deferred segment creation, etc.
这个问题的最佳解决方案可能是通过分区完全避免它。对于读取相同数量的数据,全表扫描比索引读取快得多,因为它使用了多块 IO。但是分区也有一些缺点,比如购买选项需要大量资金,以及额外的维护任务。例如,提前创建分区,或使用间隔分区(有一些其他奇怪的问题)、收集统计信息、延迟段创建等。
Ultimately, you will need to test this yourself. But remember that testing even such a simple choice is difficult. You need realistic data, realistic tests, and a realistic environment. Realistic data is much harder than it sounds. With indexes, you cannot simply copy the data and build the indexes at once. create table my_table1 as select * from
and create index ...
will create a different index than if you create the table and perform a bunch of inserts and deletes in a specific order.
最终,您需要自己对此进行测试。但请记住,即使是这样一个简单的选择也很难进行测试。您需要真实的数据、真实的测试和真实的环境。实际数据比听起来要困难得多。使用索引,您不能简单地复制数据并立即构建索引。 create table my_table1 as select * from
并且create index ...
将创建与创建表并按特定顺序执行一堆插入和删除操作不同的索引。
回答by Marlin Pierce
Your index should be TYPE, EVT_END.
您的索引应该是 TYPE, EVT_END。
CREATE INDEX PIndex
ON MY_TABLE (TYPE, EVT_END)
The optimizer plan will first go through this index to find the TYPE=123 section. Then under TYPE=123, it will have the EVT_END timestamps sorted, so it can search the b-tree for the first date in the range, and go through the dates sequentially until a data is out of the range.
优化器计划将首先通过这个索引找到 TYPE=123 部分。然后在 TYPE=123 下,它将对 EVT_END 时间戳进行排序,因此它可以在 b 树中搜索范围内的第一个日期,并依次遍历日期,直到数据超出范围。
回答by Stu
Based on the query above the functional index will provide no value. For a functional index to be used the predicate in the query would need to be written as follows:
根据上面的查询,功能索引将不提供任何值。对于要使用的功能索引,查询中的谓词需要编写如下:
SELECT [fields]
FROM MY_TABLE T
WHERE TRUNC(T.EVT_END) BETWEEN TRUNC(SYSDATE) AND TRUNC(SYSDATE) + 86399/86400
AND T.TYPE = 123
The functional index on the column EVT_END, is being ignored. It would be better to have a normal index on the EVT_END date. For a functional index to be used the left hand of the condition must match the declaration of the functional index. I would probably write the query as:
列 EVT_END 上的功能索引被忽略。最好在 EVT_END 日期有一个正常的索引。对于要使用的功能索引,条件的左侧必须与功能索引的声明相匹配。我可能会将查询写为:
SELECT [fields]
FROM MY_TABLE T
WHERE T.EVT_END BETWEEN TRUNC(SYSDATE) AND TRUNC(SYSDATE+1)
AND T.TYPE = 123
And I would create the following index:
我会创建以下索引:
CREATE INDEX bla on MY_TABLE( EVT_END )
This is assuming you are trying to find the events that ended within a day.
这是假设您正在尝试查找在一天内结束的事件。
回答by Lakshman
@S1lence:
I believe there would be a considerable time of thought behind this question being asked by you. And, I took a lot of time to post my answer here, as I don't like posting any guesses for answers.
I would like to share my websearch experience on this choice of normal Index on a date column against FBIs.
Based on my understanding on the linkbelow, if you are about to use TRUNC function for sure, then you can strike out the option of normal index, as this consulting web space says that:
Even though the column may have an index, the trunc built-in function will invalidate the index, causing sub-optimal execution with unnecessary I/O.
I suppose that clears all. You've to go with FBI if you gonna use TRUNC
for sure. Please let me know if my reply makes sense.
@S1lence:
我相信您提出这个问题的背后会有相当长的思考时间。而且,我花了很多时间在这里发布我的答案,因为我不喜欢发布任何对答案的猜测。
我想分享我在针对 FBI 的日期列中选择正常索引的网络搜索经验。
根据我对下面链接的理解,如果您确定要使用TRUNC功能,那么您可以删除正常索引的选项,正如这个咨询网站所说:
即使该列可能有索引,trunc内置函数将使索引无效,导致不必要的 I/O 的次优执行。
我想这就清除了一切。如果你要使用,你必须和联邦调查局一起去TRUNC
当然。请让我知道我的回复是否有意义。
Oracle SQL Tuning with function-based indexes
Cheers,
Lakshmanan C.
干杯,
Lakshmanan C.
回答by Chris Saxon
The deicion over whether or not to use a function-based index should be driven by how you plan to write your queries. If all your queries against the date column will be in the form TRUNC(EVT_END)
, then you should use the FBI. However, in general it will be better to create an index on just EVT_END
for the following reasons:
关于是否使用基于函数的索引的决定应该取决于您计划如何编写查询。如果您对日期列的所有查询都采用 形式TRUNC(EVT_END)
,那么您应该使用 FBI。但是,一般而言,EVT_END
出于以下原因,最好在其上创建索引:
- It will be more reusable. If you ever have queries checking particular times of the day then you can't use TRUNC.
- There will be more distinct keys in the index using just the date. If you have 1,000 different times inserted during a day,
EVT_END
will will 1,000 distinct keys, whereasTRUNC(EVT_END)
will only have 1 (this assumes that you're storing the time component and not just midnight for all the dates - in the second case both will have 1 distinct key for a day). This matters because the more distinct values an index has, the higher the selectivity of the index and the more likely it is to be used by the optimizer (see this) - The clustering factor is likely to be different, but in the case of using trunc it's more likely to go up, not down as stated in some other comments. This is because the clustering factor represents how closely the order of the values in the index match the physical storage of the data. If all your data is inserted in date order then a plain index will have the same order as the physical data. However, with
TRUNC
all times on a single day will map to the same value, so the order of rows in the index may be completely different to the physical data. Again, this means the trunc index is less likely to be used. This will entirely depend on your database's insertion/deletion patterns however. - Developers are more likely to write queries against where trunc isn't applied to the column (in my experience). Whether this holds true for you will depend upon your developers and the quality controls you have around deployed SQL.
- 它将更可重复使用。如果您曾经有查询检查一天中的特定时间,那么您不能使用 TRUNC。
- 仅使用日期,索引中就会有更多不同的键。如果您在一天中插入了 1,000 个不同的时间,
EVT_END
将有 1,000 个不同的键,而TRUNC(EVT_END)
只有 1 个(这假设您存储的是时间组件,而不仅仅是所有日期的午夜 - 在第二种情况下,两者都有 1一天的不同键)。这很重要,因为索引具有的不同值越多,索引的选择性就越高,优化器使用它的可能性就越大(请参阅此) - 聚类因子可能会有所不同,但在使用 trunc 的情况下,它更有可能上升,而不是像其他一些评论中所说的那样下降。这是因为聚类因子表示索引中值的顺序与数据物理存储的匹配程度。如果您的所有数据都按日期顺序插入,那么普通索引将与物理数据具有相同的顺序。但是,随着
TRUNC
一天中的所有时间都会映射到相同的值,因此索引中的行顺序可能与物理数据完全不同。同样,这意味着 trunc 索引不太可能被使用。然而,这将完全取决于您的数据库的插入/删除模式。 - 开发人员更有可能针对未将 trunc 应用于列的位置编写查询(以我的经验)。这是否适用于您将取决于您的开发人员和您对部署的 SQL 的质量控制。
Personally, I would go with Marlin's answer of TYPE, EVT_END
as a first pass. You need to test this in your environment however and see how this affects this query and all others using the TYPE and EVT_END columns however.
就个人而言,我会采用 Marlin 的回答TYPE, EVT_END
作为第一遍。但是,您需要在您的环境中对此进行测试,看看这会如何影响此查询以及使用 TYPE 和 EVT_END 列的所有其他查询。