Oracle:如何对 XMLType 进行全文搜索?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6338243/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 23:57:19  来源:igfitidea点击:

Oracle: how to do full text searches on an XMLType?

oraclefull-text-searchxmltype

提问by avernet

I have an app storing XML in an Oracle table as XMLType. I want to do full text searches on that data. The Oracle documentation, in Full-Text Search Over XML Data, recommends to use the containsSQL function, which requires the data to be indexed with a contextindex. The trouble is that it appears that contextindexes are asynchronous, which doesn't fit the use case I have where I need to be able to search through data right after it was added.

我有一个应用程序将 XML 存储在 Oracle 表中作为XMLType. 我想对该数据进行全文搜索。Oracle 文档在Full-Text Search Over XML Data中建议使用containsSQL 函数,该函数要求使用索引对数据进行索引context。问题是context索引似乎是异步的,这不适合我需要能够在添加数据后立即搜索数据的用例。

Can I make that index somehow synchronous? If not, what other technique should I use to do full text searches on an XMLType?

我可以使该索引以某种方式同步吗?如果没有,我应该使用什么其他技术来对XMLType?

采纳答案by Gary Myers

It can't be made transactional (i.e. it won't update the index so that the change is visible to a subsequent statement within the transaction). The best you can do is make it update on commit (SYNC ON COMMIT), as in:

它不能成为事务性的(即它不会更新索引,以便更改对事务中的后续语句可见)。您能做的最好的事情是在提交 ( SYNC ON COMMIT)时更新它,如下所示:

create index your_table_x
    on your_table(your_column)
    indextype is ctxsys.context
    parameters ('sync (on commit)');

Text indexes are complex things and I'd be surprised if you could achieve a transactional / ACID compliant text index (that is, transaction A inserting documents and have those visible in the index for that transaction and not visible to transaction B until commit).

文本索引是复杂的东西,如果你能实现一个事务性/ACID 兼容的文本索引(即事务 A 插入文档并使那些文档在该事务的索引中可见,而在提交之前对事务 B 不可见),我会感到惊讶。

回答by avernet

  1. You could update the index at a regular interval, in a cron-like kind of way. At worse, you can update the index after every update to the table, with sync_indexon which the index is built. For instance: EXEC CTX_DDL.SYNC_INDEX('your_index');I am not a big fan of this technique because of the complexity it introduces. In addition to the cron-like aspect, you have to deal with index fragmentation, which might require you to do full updates from time to time. Update:instead of updating the index at a regular interval, you can update it on commit, as suggested by Gary, which is really what you're looking for.

  2. You can do a simple text search on the XML document, as if you were doing a ctrl-f with the XML in a text editor. In many cases, this doesn't give you the expected result as users don't care if the string they are searching for happens to be used in an element name, attribute name, or namespace. But, if this method works for you, go for it: it is simple and fairly fast. For instance:

    select count(*) from your_table d
    where lower(d.your_column.getClobVal()) like '%gaga%';
    
  3. Using existsNode()in a whereclause, as in the example below. There are two potential issues with this. First, without proper indexes, this is slower then the method #2, by a factor of about 2 in my testing, and I am not sure how to create an index on unstructured data that would be used by this query. Second, you'll be doing a case-sensitive search, which is often not what you want. And you can't just call XPath's lower-case(), as Oracle only supports XPath 1.0.

    select * from your_table 
    where existsNode(your_column, '//text()[contains(., "gaga")]') = 1;
    
  1. 您可以以类似cron 的方式定期更新索引。更糟糕的是,您可以在每次更新表后更新索引,并sync_index以此为基础构建索引。例如:EXEC CTX_DDL.SYNC_INDEX('your_index');我不是这种技术的忠实粉丝,因为它引入了复杂性。除了类似 cron 的方面,您还必须处理索引碎片,这可能需要您不时进行完整更新。更新:而不是定期更新索引,您可以在提交时更新它,正如Gary建议的那样,这正是您正在寻找的。

  2. 您可以对 XML 文档进行简单的文本搜索,就像在文本编辑器中对 XML 执行 ctrl-f 一样。在许多情况下,这不会给您预期的结果,因为用户并不关心他们正在搜索的字符串是否恰好用于元素名称、属性名称或命名空间。但是,如果这种方法适合您,那就去吧:它简单且相当快。例如:

    select count(*) from your_table d
    where lower(d.your_column.getClobVal()) like '%gaga%';
    
  3. existsNode()where子句中使用,如下例所示。这有两个潜在的问题。首先,如果没有适当的索引,这比方法 #2 慢,在我的测试中大约是 2 倍,我不确定如何在此查询将使用的非结构化数据上创建索引。其次,您将进行区分大小写的搜索,这通常不是您想要的。而且您不能只调用 XPath 的lower-case(),因为 Oracle 仅支持 XPath 1.0。

    select * from your_table 
    where existsNode(your_column, '//text()[contains(., "gaga")]') = 1;