如何使用具有数百万行的表在 oracle 中更快地选择随机行

Question

提问by myeu2

Is there a way to make selecting random rows faster in oracle with a table that has million of rows. I tried to use sample(x) and dbms_random.value and its taking a long time to run.

有没有办法在 oracle 中使用具有数百万行的表更快地选择随机行。我尝试使用 sample(x) 和 dbms_random.value 并且它需要很长时间才能运行。

Thanks!

谢谢！

Answer 1

回答by Adam Musch

Using appropriate values of sample(x)is the fastest way you can. It's block-random and row-random within blocks, so if you only want one random row:

使用适当的值sample(x)是最快的方法。它是块内的块随机和行随机，所以如果你只想要一个随机行：

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum = 1

I'm using a subpartitioned table, and I'm getting pretty good randomness even grabbing multiple rows:

我正在使用一个子分区表，即使抓取多行，我也获得了很好的随机性：

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum <= 5

    FILENO    BLOCKNO     OFFSET
---------- ---------- ----------
       152    2454936         11
       152    2463140         32
       152    2335208          2
       152    2429207         23
       152    2746125         28

I suspect you should probably tune your SAMPLEclause to use an appropriate sample size for what you're fetching.

我怀疑您可能应该调整您的SAMPLE条款以使用适当的样本量来获取您要获取的内容。

Answer 2

回答by Jeffrey Kemp

Start with Adam's answer first, but if SAMPLEjust isn't fast enough, even with the ROWNUM optimization, you can use block samples:

首先从 Adam 的答案开始，但如果SAMPLE不够快，即使使用 ROWNUM 优化，您也可以使用块样本：

....FROM [table] SAMPLE BLOCK (0.01)

This applies the sampling at the block level instead of for each row. This does mean that it can skip large swathes of data from the table so the sample percent will be very rough. It's not unusual for a SAMPLE BLOCK with a low percentage to return zero rows.

这适用于块级别而不是每一行的采样。这确实意味着它可以从表中跳过大量数据，因此样本百分比将非常粗糙。具有低百分比的 SAMPLE BLOCK 返回零行并不罕见。

Answer 3

回答by Joseph Bui

Here's the same question on AskTom:

这是 AskTom 上的相同问题：

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522

If you know how big your table is, use sample block as described above. If you don't, you can modify the routine below to get however many rows you want.

如果您知道您的表有多大，请使用上述示例块。如果不这样做，您可以修改下面的例程以获取所需的行数。

Copied from: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861

复制自：http: //asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID: 6075151195522#56174726207861

create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
  for exp_v in -6..2 loop
    exit when (urowid_t.count > 0);
    if (exp_v < 2) then
      sql_v := 'select rowid from ' || table_name
      || ' sample block (' || power(10, exp_v) || ')';
    else
      sql_v := 'select rowid from ' || table_name;
    end if;
    cursor_v := dbms_sql.open_cursor;
    dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
    dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
    status_v := dbms_sql.execute(cursor_v);
    loop
      rows_v := dbms_sql.fetch_rows(cursor_v);
      dbms_sql.column_value(cursor_v, 1, urowid_t);
      exit when rows_v != 100;
    end loop;
    dbms_sql.close_cursor(cursor_v);
  end loop;
  if (urowid_t.count > 0) then
    return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
  end if;
  return null;
exception when others then
  if (dbms_sql.is_open(cursor_v)) then
    dbms_sql.close_cursor(cursor_v);
  end if;
  raise;
end;
/
show errors

Answer 4

回答by fatih tekin

Below Solution to this question is not the exact answer but in many scenarios you try to select a row and try to use it for some purpose and then update its status with "used" or "done" so that you do not select it again.

下面这个问题的解决方案不是确切的答案，但在许多情况下，您尝试选择一行并尝试将其用于某些目的，然后使用“已使用”或“完成”更新其状态，以便您不会再次选择它。

Solution:

解决方案：

Below query is useful but that way if your table is large, I just tried and see that you definitely face performance problem with this query.

下面的查询很有用，但是如果您的表很大，我只是尝试了一下，发现您肯定会遇到此查询的性能问题。

SELECT * FROM ( SELECT * FROM table ORDER BY dbms_random.value ) WHERE rownum = 1

So if you set a rownum like below then you can work around the performance problem. By incrementing rownum you can reduce the possiblities. But in this case you will always get rows from the same 1000 rows. If you get a row from 1000 and update its status with "USED", you will almost get different row everytime you query with "ACTIVE"

因此，如果您设置如下所示的 rownum，则可以解决性能问题。通过增加 rownum 可以减少可能性。但在这种情况下，您将始终从相同的 1000 行中获取行。如果您从 1000 中获得一行并使用“USED”更新其状态，则每次使用“ACTIVE”查询时几乎都会得到不同的行

SELECT * FROM
( SELECT * FROM table
where rownum < 1000
  and status = 'ACTIVE'
  ORDER BY dbms_random.value  )
WHERE rownum = 1

update the rows status after selecting it, If you can not update that means another transaction has already used it. Then You should try to get a new row and update its status. By the way, getting the same row by two different transaction possibility is 0.001 since rownum is 1000.

选择后更新行状态，如果您无法更新，则表示另一个事务已经使用了它。然后您应该尝试获取新行并更新其状态。顺便说一下，由于 rownum 是 1000，因此通过两个不同的事务获得同一行的可能性是 0.001。

Answer 5

回答by ?ilvinas

Someone told sample(x) is the fastest way you can. But for me this method works slightly faster than sample(x) method. It should take fraction of the second (0.2 in my case) no matter what is the size of the table. If it takes longer try to use hints (--+ leading(e) use_nl(e t) rowid(t)) can help

有人告诉 sample(x) 是最快的方法。但对我来说，这种方法比 sample(x) 方法稍微快一点。无论表的大小如何，它都应该是秒的一小部分（在我的情况下为 0.2）。如果需要更长的时间尝试使用提示 (--+leading(e) use_nl(et) rowid(t)) 可以提供帮助

SELECT *
  FROM My_User.My_Table
 WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
                  FROM (SELECT o.Data_Object_Id,
                               e.Relative_Fno,
                               e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
                          FROM Dba_Extents e
                          JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                         WHERE e.Segment_Name = 'MY_TABLE'
                           AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                              (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                      MAX(e.Owner)        AS Owner,
                                      MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                                 FROM Dba_Extents e
                                WHERE e.Segment_Name = 'MY_TABLE'
                                  AND e.Owner = 'MY_USER'
                                  AND e.Segment_Type = 'TABLE')) e
                  JOIN My_User.My_Table t
                    ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                   AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))

Answer 6

回答by ?ilvinas

Version with retries when no rows returned:

没有行返回时重试的版本：

WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
                     MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
                FROM (SELECT o.Data_Object_Id,
                             e.Relative_Fno,
                             e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id 
                        FROM Dba_Extents e
                        JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                       WHERE e.Segment_Name = 'MY_TABLE'
                         AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                            (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                    MAX(e.Owner)        AS Owner,
                                    MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                               FROM Dba_Extents e
                              WHERE e.Segment_Name = 'MY_TABLE'
                                AND e.Owner = 'MY_USER'
                                AND e.Segment_Type = 'TABLE')) e
                JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                                                  AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
  Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
                             FROM Dual
                             LEFT JOIN gen ON 1=1
                            UNION ALL
                           SELECT Cnt + 1, gen.Row_Id
                             FROM Retries
                             LEFT JOIN gen ON 1=1
                            WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
  FROM MY_USER.MY_TABLE
 WHERE ROWID = (SELECT Row_Id
                  FROM Retries
                 WHERE Row_Id IS NOT NULL)

Answer 7

回答by erik80

Can you use pseudorandom rows?

你可以使用伪随机行吗？

select * from (
  select * from ... where... order by ora_hash(rowid)
) where rownum<100

如何使用具有数百万行的表在 oracle 中更快地选择随机行

提问by myeu2

回答by Adam Musch

回答by Jeffrey Kemp

回答by Joseph Bui

回答by fatih tekin

回答by ?ilvinas

回答by ?ilvinas

回答by erik80

相关推荐

最近更新

标签

如何使用具有数百万行的表在 oracle 中更快地选择随机行

提问by myeu2

回答by Adam Musch

回答by Jeffrey Kemp

回答by Joseph Bui

回答by fatih tekin

回答by ?ilvinas

回答by ?ilvinas

回答by erik80

相关推荐

是否可以使用 OData 从 Oracle 公开数据？

oracle 如何知道在哪一列上应用了序列？

oracle PL/SQL 中 LOOP 中 CLOB 数据类型的串联

oracle 如何索引具有空值的日期列？

相关推荐

最近更新

标签