将 Solr 作为索引与 Oracle 作为存储数据库集成的最佳方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3841836/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 21:37:32  来源:igfitidea点击:

What is the best way to integrate Solr as an index with Oracle as a storage DB?

oraclesolrintegrationlimits

提问by Sean Molloy

I have an Oracle database with all the "data", and a Solr index where all this data is indexed. Ideally, I want to be able to run queries like this:

我有一个包含所有“数据”的 Oracle 数据库和一个 Solr 索引,其中所有这些数据都被编入索引。理想情况下,我希望能够运行这样的查询:

select * from data_table where id in ([solr query results for 'search string']);

select * from data_table where id in ([solr query results for 'search string']);

However, one key issue arises: Oracle WILL NOT allow more than 1000 items in the array of items in the "in" clause (BIG DEAL, as the list of objects I find is very often > 1000 and will usually be around the 50-200k items)

但是,出现了一个关键问题:Oracle 不允许在“in”子句中的项目数组中有超过 1000 个项目(大交易,因为我找到的对象列表经常 > 1000,并且通常在 50- 20 万件)

I have tried to work around this using a "split" function that will take a string of comma-separated values, and break them down into array items, but then I hit the 4000 char limit on the function parameter using SQL (PL/SQL is 32k chars, but it's still WAY too limiting for 80,000+ results in some cases)

我曾尝试使用“拆分”函数解决此问题,该函数将采用一串逗号分隔的值,并将它们分解为数组项,但随后我使用 SQL (PL/SQL) 达到了函数参数的 4000 个字符限制是 32k 个字符,但在某些情况下,对于 80,000 多个结果仍然太有限了)

I am also hitting performance issues using a WHERE IN (....), I am told that this causes a very slow query, even when the field referenced is an indexed field?

我也遇到了使用 WHERE IN (....) 的性能问题,有人告诉我这会导致查询速度非常慢,即使引用的字段是索引字段?

I've tried making recursive "OR"s for the 1000-item limit (aka: id in (1...1000 or (id in (1001....2000) or id in (2001....3000))) - and this works, but is veryslow.

我已经尝试为 1000 项限制(又名:id in (1...1000 or (id in (1001....2000) or id in (2001....3000))做递归“或” )) - 这有效,但速度慢。

I am thinking that I should load the Solr Client JARs into Oracle, and write an Oracle Function in Java that will call solr and pipeline back the results as a list, so that I can do something like:

我想我应该将 Solr 客户端 JAR 加载到 Oracle 中,并用 Java 编写一个 Oracle 函数,该函数将调用 solr 并将结果作为列表管道返回,以便我可以执行以下操作:

select * from data_table where id in (select * from table(runSolrQuery('my query text')));

select * from data_table where id in (select * from table(runSolrQuery('my query text')));

This is proving quite hard, and I am not sure it's even possible.

事实证明这是非常困难的,我不确定这是否可能。

Things that I can't do:

我不能做的事情:

  • Store full data in Solr (security + storage limits)
  • User Solr as controller of pagination and ordering (this is why I am fetching data from the DB)
  • 在 Solr 中存储完整数据(安全性 + 存储限制)
  • 用户 Solr 作为分页和排序的控制器(这就是我从数据库获取数据的原因)

So I have to cook up a hybrid approach where Solr really act like the full-text search provider for Oracle. Help! Has anyone faced this?

因此,我必须制定一种混合方法,让 Solr 真正充当 Oracle 的全文搜索提供程序。帮助!有没有人遇到过这种情况?

回答by Jason

Check this out: http://demo.scotas.com/search-sqlconsole.php

看看这个:http: //demo.scotas.com/search-sqlconsole.php

This product seems to do exactly what you need.

该产品似乎正是您所需要的。

cheers

干杯

回答by Justin Cave

I'm not a Solr expert, but I assume that you can get the Solr query results into a Java collection. Once you have that, you should be able to use that collection with JDBC. That avoids the limit of 1000 literal items because your IN list would be the result of a query, not a list of literal values.

我不是 Solr 专家,但我假设您可以将 Solr 查询结果放入 Java 集合中。一旦你有了它,你应该能够在 JDBC 中使用该集合。这避免了 1000 个文字项的限制,因为您的 IN 列表将是查询的结果,而不是文字值列表。

Dominic Brooks has an example of using object collections with JDBC. You would do something like

Dominic Brooks 有一个在 JDBC使用对象集合的示例。你会做类似的事情

Create a couple of types in Oracle

在 Oracle 中创建几个类型

CREATE TYPE data_table_id_typ AS OBJECT (
  id NUMBER
);

CREATE TYPE data_table_id_arr AS TABLE OF data_table_id_typ;

In Java, you can then create an appropriate STRUCT array, populate this array from Solr, and then bind it to the SQL statement

在 Java 中,您可以创建一个合适的 STRUCT 数组,从 Solr 填充这个数组,然后将其绑定到 SQL 语句

SELECT *
  FROM data_table
 WHERE id IN (SELECT * FROM TABLE( CAST (? AS data_table_id_arr)))

回答by Marko Bonaci

Instead of using a long BooleanQuery, you can use TermsFilter (works like RangeFilter, but the items doesn't have to be in sequence).

您可以使用TermsFilter(工作方式类似于RangeFilter,但项目不必按顺序排列),而不是使用长的BooleanQuery。

Like this (first fill your TermsFilter with terms):

像这样(首先用条款填写您的条款过滤器):

TermsFilter termsFilter = new TermsFilter();

        // Loop through terms and add them to filter
        Term term = new Term("<field-name>", "<query>");
        termsFilter.addTerm(term);

then search the index like this:

然后像这样搜索索引:

DocList parentsList = null;
parentsList = searcher.getDocList(new MatchAllDocsQuery(),  searcher.convertFilter(termsFilter), null, 0, 1000);

Where searcher is SolrIndexSearcher (see java doc for more info on getDocList method): http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html

其中搜索者是 SolrIndexSearcher(有关 getDocList 方法的更多信息,请参阅 java 文档):http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html

回答by rfeak

Two solutions come to mind.

想到了两个解决方案。

First, look into using Oracle specific Java extensions to JDBC. They allow you to pass in an actual array/list as an argument. You may need to create a stored proc (it has a been a while since I had to do this), but if this is a focused use case, it shouldn't be overly burdensome.

首先,研究将 Oracle 特定的 Java 扩展用于 JDBC。它们允许您将实际数组/列表作为参数传递。您可能需要创建一个存储过程(自从我不得不这样做以来已经有一段时间了),但如果这是一个重点用例,它不应该过于繁重。

Second, if you are still running into a boundary like 1000 object limits, consider using the "rows" setting when querying Solr and leveraging it's inherent pagination feature.

其次,如果您仍然遇到像 1000 个对象限制这样的边界,请考虑在查询 Solr 并利用其固有的分页功能时使用“行”设置。

I've used this bulk fetching method with stored procs to fetch large quantities of data which needed to be put into Solr. Involve your DBA. If you have a good one, and use the Oracle specific extensions, I think you should attain very reasonable performance.

我已经使用这种带有存储过程的批量获取方法来获取需要放入 Solr 的大量数据。让您的 DBA 参与进来。如果你有一个好的,并使用 Oracle 特定的扩展,我认为你应该获得非常合理的性能。