使用大“IN”子句优化 Oracle SQL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3641931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 21:27:19  来源:igfitidea点击:

Optimize Oracle SQL with large 'IN' clause

javasqloracleoptimizationquery-optimization

提问by NeoNosliw

Here I have a query like below:

在这里,我有如下查询:

SELECT field
FROM table
WHERE value IN ('val1', 'val2', 'val3', ... 'valn')

Let's say there are 2000 values inside the IN clause, the value doesn't exist in other table. Do you have any idea to speed up this operation?

假设 IN 子句中有 2000 个值,该值在其他表中不存在。你有什么想法可以加快这个操作吗?

The question is open to accept any kind of methods..

这个问题可以接受任何类型的方法。

Thanks!

谢谢!

采纳答案by TTT

This looks like the right way in Java: http://knol.google.com/k/oracle-passing-a-list-as-bind-variable#

这在 Java 中看起来是正确的方式:http: //knol.google.com/k/oracle-passing-a-list-as-bind-variable#

It is similar to the C# solution. Your list of value stays in memory (no temporary table) and it will not be persisted to disk and you use a parameterized query so the query executer doesn't have to reparse every query. I have no tried it with java but I think that it will be fast.

它类似于 C# 解决方案。您的值列表保留在内存中(没有临时表),它不会持久保存到磁盘,并且您使用参数化查询,因此查询执行器不必重新解析每个查询。我没有用 java 尝试过,但我认为它会很快。

回答by Mitch Wheat

  1. Create an index that covers 'field' and 'value'.

  2. Place those IN values in a temp table and join on it.

  1. 创建一个涵盖“字段”和“值”的索引。

  2. 将这些 IN 值放在临时表中并在其上加入。

回答by Hendra Jaya

SELECT field
FROM table
WHERE value IN SELECT somevalue from sometable

As far as i know, you will face another problem. That will be the limitation of 'IN' clause. Using this, you can avoid that and hopefully fasten your query

据我所知,您将面临另一个问题。这将是“IN”子句的限制。使用它,您可以避免这种情况,并希望加快您的查询

回答by TTT

You can join a normal table with a memory table that is filled with the list of values.

您可以将普通表与填充有值列表的内存表连接起来。

I don't how to do that with Java exactly but I do know how to do this with C#. I think something similar should be possible with Java.

我不知道如何用 Java 准确地做到这一点,但我知道如何用 C# 做到这一点。我认为 Java 应该可以实现类似的功能。

Read here: http://forums.oracle.com/forums/thread.jspa?threadID=892457&tstart=375

在这里阅读:http: //forums.oracle.com/forums/thread.jspa?threadID=892457&tstart=375

Let's use a collection of User Defined Types (UDT's). First create a table with 1 million rows:

让我们使用用户定义类型 (UDT) 的集合。首先创建一个包含 100 万行的表:

create table employees (id number(10) not null primary key, name varchar2(100) );

insert into employees 
select level l, 'MyName'||to_char(level) 
from dual connect by level <= 1e6;

1000000 rows created

commit;

exec dbms_stats.gather_schema_stats(USER, cascade=>TRUE);

No we turn to the C# code:

不,我们转向 C# 代码:

Let's select employees with id's 3 and 4.

让我们选择 id 为 3 和 4 的员工。

Collection type MDSYS.SDO_ELEM_INFO_ARRAY is used because if we use this already predefined Oracle type we don't have to define our own Oracle type. You can fill collection MDSYS.SDO_ELEM_INFO_ARRAY with max 1048576 numbers.

使用集合类型 MDSYS.SDO_ELEM_INFO_ARRAY 是因为如果我们使用这个已经预定义的 Oracle 类型,我们就不必定义我们自己的 Oracle 类型。您可以使用最多 1048576 个数字填充集合 MDSYS.SDO_ELEM_INFO_ARRAY。

using Oracle.DataAccess.Client;
using Oracle.DataAccess.Types;

    [OracleCustomTypeMappingAttribute("MDSYS.SDO_ELEM_INFO_ARRAY")]
    public class NumberArrayFactory : IOracleArrayTypeFactory
    {
      public Array CreateArray(int numElems)
      {
        return new Decimal[numElems];
      }

      public Array CreateStatusArray(int numElems)
      {
        return null;
      }
    }


    private void Test()
    {
      OracleConnectionStringBuilder b = new OracleConnectionStringBuilder();
      b.UserID = "sna";
      b.Password = "sna";
      b.DataSource = "ora11";
      using (OracleConnection conn = new OracleConnection(b.ToString()))
      {
        conn.Open();
        using (OracleCommand comm = conn.CreateCommand())
        {
          comm.CommandText =
              @" select  /*+ cardinality(tab 10) */ *  " +
              @" from employees, table(:1) tab " +
              @" where employees.id = tab.column_value";

          OracleParameter p = new OracleParameter();
          p.OracleDbType = OracleDbType.Array;
          p.Direction = ParameterDirection.Input;
          p.UdtTypeName = "MDSYS.SDO_ELEM_INFO_ARRAY";
          p.Value = new Decimal[] { 3, 4 };

          comm.Parameters.Add(p);

          int numPersons = 0;
          using (OracleDataReader reader = comm.ExecuteReader())
          {
            while (reader.Read())
            {
              MessageBox.Show("Name " + reader[1].ToString());
              numPersons++;
            }
          }
          conn.Close();
        }
      }
    }

The index on employees.id isn't used when one omits hint /*+ cardinality(tab 10) */. This index is created by Oracle because id is the primary key column.

当省略提示 /*+ cardinality(tab 10) */ 时,没有使用员工 id 上的索引。该索引由 Oracle 创建,因为 id 是主键列。

This means that you don't have to fill a temporary table. The list of vaues stays in ram and you join your table employeeswith this list of values in memory table(:1) tab.

这意味着您不必填充临时表。vaues 列表保留在 ram 中,您将使用 memory table(:1) tab 中的此值列表加入您的表员工

(wateenmooiedag=TTT)

(wateenmooiedag=TTT)

回答by Alex

Just rewrite you IN to exists. It will be faster.

只需将您重写为存在即可。它会更快。

回答by Jarekczek

I got acceptable performance (execution time close to fetching rows unconditionally) when doing a similar query with the following approach.

使用以下方法执行类似查询时,我获得了可接受的性能(执行时间接近无条件获取行)。

static final int MAX_QUERY_SET = 1000;

I iterate the values and make separate query every MAX_QUERY_SETvalue. So for 10K values I have 10 queries. I process the queries sequentially.

我迭代这些值并对每个MAX_QUERY_SET值进行单独的查询。所以对于 10K 值,我有 10 个查询。我按顺序处理查询。

After implementing this algorithm I was able to play with the constant. For either value of 30 or 3000 I got 3x longer execution time. So I sticked to 1000.

实现这个算法后,我就可以玩这个常数了。对于 30 或 3000 的值,我的执行时间延长了 3 倍。所以我坚持1000。

This may not work if you can't process multiple queries. My experience was gathered on a different database (Pervasive, with a limit of 65K characters per statement), but I think this question is quite general and its conclusions should be common.

如果您无法处理多个查询,这可能不起作用。我的经验是在不同的数据库(Pervasive,每条语句限制为 65K 个字符)上收集的,但我认为这个问题很笼统,其结论应该是通用的。

回答by Jayan

Oracle query parsing and caching mechanism works better when you use bind variables. If your query may perform better by using them

当您使用绑定变量时,Oracle 查询解析和缓存机制效果更好。如果您的查询可以通过使用它们更好地执行

SELECT field FROM table WHERE value IN (?,? ....) and Then assign values as needed.

SELECT field FROM table WHERE value IN (?,? ....) 然后根据需要分配值。

It is much better to analyse performance of real execution using some tool like enterprise management console and decide of improvements. Creating index is probably the first step to do.

使用诸如企业管理控制台之类的工具分析实际执行的性能并决定改进要好得多。创建索引可能是第一步。

Storing potential values in another table and using suggestions from J Horstmann seems a correct idea. Please give it a try.

将潜在值存储在另一个表中并使用 J Horstmann 的建议似乎是一个正确的想法。请试一试。

回答by Vishnu Gupta

More information is required for selecting better solution.

选择更好的解决方案需要更多信息。

  1. Does query execute frequently?
  2. Are values val1, val2 fixed?
  3. How big is the table?
  1. 查询是否频繁执行?
  2. 值 val1、val2 是固定的吗?
  3. 桌子有多大?

If query is executed frequently, values val1, val2 etc are fixed and table is large (having say 20,000 or more rows) then store all the values in a another table (say temp table) and join both tables on value field.

如果查询频繁执行,值 val1、val2 等是固定的并且表很大(比如 20,000 行或更多行)然后将所有值存储在另一个表中(比如临时表)并在值字段上连接两个表。

If table in the query below is large, there should be index on value field to improve the performance.

如果下面查询中的表很大,则应在值字段上设置索引以提高性能。

SELECT field FROM table WHERE value IN ('val1', 'val2', 'val3', ... 'valn')

SELECT field FROM table WHERE value IN ('val1', 'val2', 'val3', ... 'valn')

Both the table should be analyzed.

两个表都要分析。

The reason for better performance is that the optimizer will choose best join method depending on the characteristic of the table. If table in above query is very large, the join will be nested loop join and above table should have an index on the column val.

性能更好的原因是优化器会根据表的特性选择最佳连接方法。如果上面查询中的表非常大,连接将是嵌套循环连接并且上面的表应该在列 val 上有一个索引。

If table in above query is very small (say fewer then 200-300 rows) the new table (temp table) should have index on val column.

如果上面查询中的表非常小(比如少于 200-300 行),则新表(临时表)应该在 val 列上有索引。

If both tables are of almost same size indexes will not help much.

如果两个表的大小几乎相同,索引将无济于事。

Conclusion: Best solution depends on specific situation.

结论:最佳解决方案取决于具体情况。

回答by Jon Heller

If none of the other suggestions work, and the query takes a long time, you can try running it in parallel.

如果其他建议都不起作用,并且查询需要很长时间,您可以尝试并行运行它。

  select /*+ parallel(table) */ field ...

回答by J?rn Horstmann

If you already have an index on the value field and the values are not available in any tables to join or sub-select from, then I don't think there are any possibilities to optimize. In the special case where your values are really "val1", "val", ... you could use a like query which would uses an index to search on the prefix. But I asume that was just an example.

如果您已经在 value 字段上有一个索引,并且这些值在任何表中都不可用以进行连接或子选择,那么我认为没有任何优化的可能性。在您的值实际上是“val1”、“val”等特殊情况下,您可以使用类似查询,该查询将使用索引来搜索前缀。但我认为这只是一个例子。