oracle 等价于跨多个表的复合索引?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3373907/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 21:08:50  来源:igfitidea点击:

Equivalent of a composite index across multiple tables?

sqloracle

提问by jthg

I have a table structure similar the following:

我有一个类似于以下的表结构:

create table MAIL (
  ID        int,
  FROM      varchar,
  SENT_DATE date
);

create table MAIL_TO (
  ID      int,
  MAIL_ID int,
  NAME      varchar
);

and I need to run the following query:

我需要运行以下查询:

select m.ID 
from MAIL m 
  inner join MAIL_TO t on t.MAIL_ID = m.ID
where m.SENT_DATE between '07/01/2010' and '07/30/2010'
  and t.NAME = '[email protected]'

Is there any way to design indexes such that both of the conditions can use an index? If I put an index on MAIL.SENT_DATE and an index on MAIL_TO.NAME, the database will choose to use either one of the indexes or the other, not both. After filtering by the first condition the database always has to do a full scan of the results for the second condition.

有没有办法设计索引,使得两个条件都可以使用索引?如果我在 MAIL.SENT_DATE 上放置一个索引并在 MAIL_TO.NAME 上放置一个索引,则数据库将选择使用其中一个索引或另一个,而不是两者。按第一个条件过滤后,数据库始终必须对第二个条件的结果进行全面扫描。

采纳答案by OMG Ponies

A materialized viewwould allow you to index the values, assuming the stringent materialized view criteria is met.

一个物化视图将允许您指数值,假设严格物化视图条件满足。

回答by tpdi

Oracle can use both indices. You just don't have the righttwo indices.

Oracle 可以使用这两个索引。您只是没有正确的两个索引。

Consider: if the query plan uses your index on mail.sent_datefirst, what does it get from mail? It gets all the mail.ids where mail.sent_dateis within the range you gave in your whereclause, yes?

考虑:如果查询计划首先使用您的索引mail.sent_date,它从mail什么获得?它获取您在条款中给出的范围内的所有mail.ids ,是吗?mail.sent_datewhere

So it goes to mail_towith a list of mail.ids and the mail.nameyou gave in your whereclause. At this point, Oracle decides that it's better to scan the table for matching mail_to.mail_ids rather than use the index on mail_to.name.

因此,它mail_tomail.ids 和mail.name您在where子句中给出的列表有关。此时,Oracle 决定最好扫描表以查找匹配的mail_to.mail_ids,而不是使用 上的索引mail_to.name

Indices on varchars are always problematic, and Oracle really prefers full table scans. But if we give Oracle an index containing the columns it really wantsto use, and depending on total table rows and statistics, we can get it to use it. This is the index:

varchars 上的索引总是有问题,Oracle 确实更喜欢全表扫描。但是如果我们给 Oracle 一个包含它真正想要使用的列的索引,并且根据总表行和统计信息,我们可以让它使用它。这是索引:

 create index mail_to_pid_name on mail_to( mail_id, name ) ; 

This works where an index just on namedoesn't, because Oracle's not looking just for a name, but for a mail_idanda name.

这适用于索引name不存在的情况,因为 Oracle 不只是在寻找名称,而是在寻找 amail_ida name

Conversely, if the cost-based analyzer determines it's cheaper to go to table mail_tofirst, and uses your index on mail_to.name, what doe sit get? A bunch of mail_to_.mail_ids to look up in mail. It needs to find rows with those ids andcertain sent_dates, so:

相反,如果基于成本的分析器确定先去表更便宜mail_to,并使用您的索引mail_to.name,那么坐得到什么?一堆mail_to_.mail_id要查找的s mail。它需要找到具有这些 id某些 sent_dates 的行,因此:

 create index mail_id_sentdate on mail( sent_date, id ) ; 

Note that in this case I've put sent_datefirst in the index, and idsecond. (This is more an intuitive thing.)

请注意,在这种情况下,我将sent_date第一个放在索引中,然后放在id第二个中。(这更直观。)

Again, the take home point is this: in creating indices, you have to consider not just the columns in your whereclause, but also the columns in your join conditions.

同样,重点是:在创建索引时,您不仅要考虑where子句中的列,还要考虑连接条件中的列。



Update

更新

jthg: yes, it always depends on how the data is distributed. And on how many rows are in the table: if very many, Oracle will do a table scan and hash join, if very few it will do a table scan. You might reverse the order of either of the two indices. By putting sent_date first in the second index, we eliminate most needs for an index solely on sent_date.

jthg:是的,这总是取决于数据的分布方式。以及表中有多少行:如果很多,Oracle 将进行表扫描和散列连接,如果很少,它将进行表扫描。您可以颠倒两个索引中任何一个的顺序。通过将 sent_date 放在第二个索引中,我们消除了对仅在 上的索引的大多数需求sent_date

回答by Frank

Which criterion is more selective? The date range or the addressee? I would guess the addressee. And if that is highly selective, do not care for the date index, just let the database do the search based on the found mail ids. But index table MAILon the id if it is not already.

哪个标准更具选择性?日期范围或收件人?我猜收件人。如果这是高度选择性的,不要关心日期索引,只需让数据库根据找到的邮件 ID 进行搜索。但是MAIL如果还没有,则在 id 上索引表。

On the other hand, some modern optimizers would even make use of both indexes, scanning both tables and than build a hash value of the join columns to merge the results of both. I am not absolutely sure if and when Oracle would chose this strategy. I just realized that SQL Server tends to make hash joins rather often, compared to other engines.

另一方面,一些现代优化器甚至会同时使用两个索引,扫描两个表,然后构建连接列的哈希值以合并两者的结果。我不确定 Oracle 是否以及何时会选择这种策略。我刚刚意识到,与其他引擎相比,SQL Server 往往更频繁地进行哈希连接。

回答by Marcus Adams

If your queries are generally for a particular month, then you could partitionthe data by month.

如果您的查询通常针对特定月份,那么您可以按月对数据进行分区

回答by Sean

In situations where the requirements aren't met for a materialized view, there are these two options:

在物化视图不满足要求的情况下,有以下两种选择:

1) You can create a cross reference table, and keep this updated with triggers.

1)您可以创建一个交叉引用表,并使用触发器保持更新。

The concepts would be the same with Oracle, but i only have SQL Server installed at the moment to run the test, see this setup:

概念与 Oracle 相同,但我目前只安装了 SQL Server 来运行测试,请参阅此设置:

create table MAIL (
  ID        INT IDENTITY(1,1),
  [FROM]      VARCHAR(200),
  SENT_DATE DATE,
  CONSTRAINT PK_MAIL PRIMARY KEY (ID)
);

create table MAIL_TO (
  ID      INT IDENTITY(1,1),
  MAIL_ID INT,
  [NAME]     VARCHAR (200),
  CONSTRAINT PK_MAIL_TO PRIMARY KEY (ID)
);

ALTER TABLE [dbo].[MAIL_TO]  WITH CHECK ADD  CONSTRAINT [FK_MAILTO_MAIL] FOREIGN KEY([MAIL_ID])
REFERENCES [dbo].[MAIL] ([ID])
GO

ALTER TABLE [dbo].[MAIL_TO] CHECK CONSTRAINT [FK_MAILTO_MAIL]
GO


CREATE TABLE CompositeIndex_MailSentDate_MailToName ( 
[MAIL_ID] INT,
[MAILTO_ID] INT,
SENT_DATE DATE,
MAILTO_NAME VARCHAR(200),
CONSTRAINT PK_CompositeIndex_MailSentDate_MailToName PRIMARY KEY (MAILTO_ID,MAIL_ID)
)

GO

CREATE NONCLUSTERED INDEX IX_MailSent_MailTo ON dbo.CompositeIndex_MailSentDate_MailToName (SENT_DATE,MAILTO_NAME)
CREATE NONCLUSTERED INDEX IX_MailTo_MailSent ON dbo.CompositeIndex_MailSentDate_MailToName (MAILTO_NAME,SENT_DATE)
GO

CREATE TRIGGER dbo.trg_MAILTO_Insert
ON dbo.MAIL_TO  
AFTER INSERT AS  
BEGIN 
 INSERT INTO dbo.CompositeIndex_MailSentDate_MailToName ( MAIL_ID, MAILTO_ID, SENT_DATE, MAILTO_NAME )
 SELECT mailTo.MAIL_ID,mailTo.ID,m.SENT_DATE,mailTo.NAME
 FROM
 inserted mailTo
 INNER JOIN dbo.MAIL m ON m.ID = mailTo.MAIL_ID
END
GO


CREATE TRIGGER dbo.trg_MAILTO_Delete
ON dbo.MAIL_TO  
AFTER DELETE AS  
BEGIN 
 DELETE mailToDelete
 FROM
 dbo.MAIL_TO mailToDelete
 INNER JOIN deleted ON mailToDelete.ID = deleted.ID
END
GO

CREATE TRIGGER dbo.trg_MAILTO_Update
ON dbo.MAIL_TO  
AFTER UPDATE AS  
BEGIN 
 UPDATE compositeIndex
 SET
 compositeIndex.MAILTO_NAME = updates.NAME
 FROM
 dbo.CompositeIndex_MailSentDate_MailToName compositeIndex
 INNER JOIN inserted updates ON updates.ID = compositeIndex.MAILTO_ID
END
GO

CREATE TRIGGER dbo.trg_MAIL_Update
ON dbo.MAIL  
AFTER UPDATE AS  
BEGIN 
 UPDATE compositeIndex
 SET
 compositeIndex.SENT_DATE = updates.SENT_DATE
 FROM
 dbo.CompositeIndex_MailSentDate_MailToName compositeIndex
 INNER JOIN inserted updates ON updates.ID = compositeIndex.MAIL_ID
END
GO


INSERT INTO dbo.MAIL ( [FROM], SENT_DATE )
SELECT 'SenderA','2018-10-01'
UNION ALL SELECT 'SenderA','2018-10-02'

INSERT INTO dbo.MAIL_TO ( MAIL_ID, NAME )
SELECT 1,'CustomerA'
UNION ALL SELECT 1,'CustomerB'
UNION ALL SELECT 2,'CustomerC'
UNION ALL SELECT 2,'CustomerD'
UNION ALL SELECT 2,'CustomerE'


SELECT * FROM dbo.MAIL
SELECT * FROM dbo.MAIL_TO
SELECT * FROM dbo.CompositeIndex_MailSentDate_MailToName

You can then use the dbo.CompositeIndex_MailSentDate_MailToNametable to JOIN to the rest of your data. This is useful in environments where your rate of inserts and updates are low, but your query needs are high. So the relative overhead of implementing the triggers is small.

然后,您可以使用该dbo.CompositeIndex_MailSentDate_MailToName表 JOIN 到您的其余数据。这在插入和更新率较低但查询需求较高的环境中很有用。所以实现触发器的相对开销很小。

This has the advantage of being updated transactionally, in real time.

这具有在事务上实时更新的优点。

2) If you don't want the performance/management overhead of a trigger, and you only need this for next day reporting, you can create a view, and a nightly process which truncates the table and selects the entire view into a materialized table.

2)如果您不想要触发器的性能/管理开销,而您只需要第二天报告,您可以创建一个视图,以及一个截断表并将整个视图选择为物化表的夜间进程.

I've used this successfully to index flattened relational data requiring joins across a dozen or so tables.. reducing report times from hours to seconds. While it's an expensive query, you can set the job to run off hours if you have periods of reduced usage.

我已经成功地使用它来索引需要跨十几个表连接的扁平关系数据..将报告时间从几小时减少到几秒钟。虽然这是一个昂贵的查询,但如果您的使用时间有所减少,您可以将作业设置为在非工作时间运行。