加速 Oracle Text 索引或让索引器仅在低加载时间下工作

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3298864/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 21:04:52  来源:igfitidea点击:

Speed up Oracle Text indexing or let the indexer work only on low load times

oraclefull-text-indexingdatabase-tuning

提问by Stefan

We're using a Oracle Text CTXSYS.CONTEXT index to index about half a million rows containing metainformation. The information is spread over two tables that are combined by a procedure that the indexer calls at runtime (functional index).

我们使用 Oracle Text CTXSYS.CONTEXT 索引来索引大约 50 万行包含元信息的行。信息分布在两个表上,这些表由索引器在运行时调用的过程(功能索引)组合在一起。

When I run the CREATE INDEX on my local machine (simple dualcore notebook) the index is built in about 3 minutes. On our DB server which runs on Solaris with 8 cores and 16G of RAM it takes abozt 24 hours to create an index for the same (exactly the same) data.

当我在本地机器(简单的双核笔记本)上运行 CREATE INDEX 时,索引在大约 3 分钟内建立。在我们的数据库服务器上运行 8 核和 16G RAM 的 Solaris 上,为相同(完全相同)的数据创建索引需要大约 24 小时。

Sample code:This is our index feeder for two tables and 3 columns:

示例代码:这是我们的两个表和 3 列的索引馈送器:

create or replace procedure docmeta_revisions_text_feeder 
    ( p_rowid in rowid , p_clob in out nocopy clob) as v_clob CLOB begin
    FOR c1 IN (select DM.DID, DM.XDESCRIB || ' ' || DM.XAUTHOR AS data
        from DOCMETA DM
        WHERE ROWID = p_rowid) 
    LOOP
        v_clob := v_clob || c1.data;
        FOR c2 IN (
            SELECT ' ' || RV.DDOCTITLE AS data
            FROM   REVISIONS RV
            WHERE  RV.DID = c1.DID)
        LOOP
            v_clob := v_clob || c2.data;
        END LOOP;
    END LOOP;
    p_clob := v_clob;    
    end docmeta_revisions_text_feeder

These are the preferences

这些是偏好

BEGIN
CTX_DDL.CREATE_PREFERENCE ('concat_DM_RV_DS', 'USER_DATASTORE');
CTX_DDL.SET_ATTRIBUTE ('concat_DM_RV_DS', 'PROCEDURE',
'docmeta_revisions_text_feeder');
 END;

Now we create the index

现在我们创建索引

CREATE INDEX concat_DM_RV_idx ON DOCMETA (FULLTEXTIDX_DUMMY)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('datastore concat_DM_RV_DS 
section group CTXSYS.AUTO_SECTION_GROUP
') PARALLEL 4;

The data mostly consists of a simple title or author name + a short description with < 1k text.

数据主要由一个简单的标题或作者姓名 + 一个 <1k 文本的简短描述组成。

I tried to play a little bit with the involved memory settings and the PARALLEL parameter but haven't any success. So here come my questions:

我尝试使用所涉及的内存设置和 PARALLEL 参数,但没有任何成功。所以我的问题来了:

  • is there a way to pause and resume an indexing process (I have the CTX_SYS role at hand) ?
  • has anyone a hint which parameter could be tweaked (esp. the memory size)?
  • is it possible to export and import a text index? -> then I could carry out the indexing on my local machine and simply copy it to our server
  • can an indexer run with "lower priority"?
  • it is possible that the indexer has been disturbed by locking operations (it's a staging machine that other's access in parallel). Is there a way to lock the involved tables, create the index and unlock them afterwards ?
  • 有没有办法暂停和恢复索引过程(我手头有 CTX_SYS 角色)?
  • 有没有人提示可以调整哪个参数(特别是内存大小)?
  • 是否可以导出和导入文本索引?-> 然后我可以在我的本地机器上执行索引并将其复制到我们的服务器
  • 索引器可以以“较低优先级”运行吗?
  • 索引器可能受到锁定操作的干扰(它是其他并行访问的登台机器)。有没有办法锁定所涉及的表,创建索引并在之后解锁它们?

采纳答案by Stefan

We finally figured out how to do a splitted sync of the index. Here are some basic steps that show what we did:

我们终于想出了如何对索引进行拆分同步。以下是一些基本步骤,展示了我们所做的:

CREATE INDEX concat_DM_RV_idx ON DOCMETA (FULLTEXTIDX_DUMMY)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('datastore concat_DM_RV_DS section group CTXSYS.AUTO_SECTION_GROUP
NOPOPULATE
');

see the NOPOPULATE parameter? that tells the indexer that it shouldn't start the populating / indexing process. If you're on 11g you now have a very nice CTX_DDL feature at hand that populates the index at will, namely the procedure "POPULATE_PENDING". Calling it on your index name will populate the CTXSYS table that holds rows that have been modified and therefore are out of sync. Note that after calling this method the indexer still hasn't started anything. Since 10g (?) the according CTX_DDL.SYNC_INDEX procedure has several additional parameters, e.g. the "maxtime" parameter. Provide it with, say, 4H and your indexer will start to sync pending rows for about 4 hours. You repeat that procedure by schedule and are done.

看到 NOPOPULATE 参数了吗?这告诉索引器它不应该开始填充/索引过程。如果您使用的是 11g,您现在手头有一个非常好的 CTX_DDL 功能,可以随意填充索引,即过程“POPULATE_PENDING”。在您的索引名称上调用它会填充 CTXSYS 表,该表包含已修改并因此不同步的行。请注意,在调用此方法后,索引器仍未启动任何操作。从 10g (?) 开始,相应的 CTX_DDL.SYNC_INDEX 过程有几个附加参数,例如“maxtime”参数。为它提供,比如说,4H,你的索引器将开始同步挂起的行大约 4 小时。您按计划重复该程序并完成。

That doesn't work in 9i unfortunately. So we tried successfully to "simulate" the Oracle POPULATE_PENDING process. The only restriction on this method is: you need some kind of unique row identifier to be able to query chunks of the same content from your table. Here's what we did:

不幸的是,这在 9i 中不起作用。所以我们尝试成功地“模拟”了Oracle POPULATE_PENDING 过程。此方法的唯一限制是:您需要某种唯一的行标识符才能从表中查询相同内容的块。这是我们所做的:

1.) Create the index with NOPOPULATE (see above) 2.) Become SYS / DBA / CTXSYS (yes, you might call your admin for that). Find out the ID that your freshly created index has by querying the index meta table:

1.) 使用 NOPOPULATE 创建索引(见上文) 2.) 成为 SYS / DBA / CTXSYS(是的,您可以为此致电您的管理员)。通过查询索引元表找出新创建的索引的 ID:

SELECT IDX_ID FROM CTXSYS.CTX_INDEXES WHERE IDX_NAME ='concat_DM_RV_idx';

3.) note the index ID this is yielding on a yellow snippet of paper and execute this insertion statement as CTXSYS role and replace the <> with your index id and the <> with the name of the table that the index is built on. The unique row identifer can be some kind of document ID or any kind of countable statement that creates a unique chunk of data of your table :

3.) 记下在黄色纸片上产生的索引 ID,并以 CTXSYS 角色执行此插入语句,并将 <> 替换为您的索引 ID,将 <> 替换为建立索引的表的名称。唯一行标识符可以是某种文档 ID 或任何类型的可数语句,用于创建表的唯一数据块:

INSERT INTO CTXSYS.DR$PENDING (PND_CID,PND_PID,PND_ROWID,PND_TIMESTAMP)
SELECT <<your index id>>, 0, <<basetable name>>.ROWID, CURRENT_DATE
FROM gsms.DOCMETA
WHERE <<basetable unique row identifier>> < 50000;
COMMIT; -- Dont forget the COMMIT! DONT FORGET IT!!! WE MEAN IT!

The "50.000" marks the number of rows depending on the scarceness of your basetabel that'll be inserted in the pending rows table as payload for the indexer. Adjust it for your own needs.

“50.000”根据将作为索引器有效负载插入待处理行表中的基表的稀缺性来标记行数。根据您自己的需要进行调整。

4.) Now we are setup to let the indexer loose.

4.) 现在我们设置好让索引器松动。

CALL CTX_DDL.SYNC_INDEX(
  'CONCAT_DM_RV_IDX', -- your index name here
  '100M', -- memory count
  NULL, -- param for partitioned idxes
  2 -- parallel count
);

will start the indexing process on whatever count of rows you have inserted in step 3.) To run the next chunk repeat step 3.) with the next 50.000 or so rows ("where id between 50.000 and 100.000")

将在您在第 3 步中插入的任何行数上启动索引过程。)运行下一个块重复第 3 步。)接下来的 50.000 行左右(“其中 id 介于 50.000 和 100.000 之间”)

If you accidentally run the indexer on the same set of rows the index will strongly fragment. The only way to clean it up is to optimize the index with a REBUILD parameter. On our local machine that was extremely fast since the indexer doesn't have to run but only rearranges the index tables' contents:

如果您不小心在同一组行上运行索引器,索引将强烈碎片化。清理它的唯一方法是使用 REBUILD 参数优化索引。在我们的本地机器上速度非常快,因为索引器不必运行而只需重新排列索引表的内容:

CALL CTX_DDL.OPTIMIZE_INDEX('CONCAT_DM_RV_IDX', 'REBUILD');

If you need some meta information about the indexing status and size you can ask the CTX_REPORT package:

如果你需要一些关于索引状态和大小的元信息,你可以询问 CTX_REPORT 包:

SELECT CTX_REPORT.INDEX_SIZE('CONCAT_DM_RV_IDX') FROM DUAL;

And if you forgot which parameters you chose on indexing time:

如果您忘记了在索引时选择了哪些参数:

SELECT * FROM CTXSYS.CTX_PARAMETERS;

Happy indexing!

索引快乐!