如何使用具有数百万行的表在 oracle 中更快地选择随机行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3150550/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to make selecting random rows in oracle faster with table with millions of rows
提问by myeu2
Is there a way to make selecting random rows faster in oracle with a table that has million of rows. I tried to use sample(x) and dbms_random.value and its taking a long time to run.
有没有办法在 oracle 中使用具有数百万行的表更快地选择随机行。我尝试使用 sample(x) 和 dbms_random.value 并且它需要很长时间才能运行。
Thanks!
谢谢!
回答by Adam Musch
Using appropriate values of sample(x)
is the fastest way you can. It's block-random and row-random within blocks, so if you only want one random row:
使用适当的值sample(x)
是最快的方法。它是块内的块随机和行随机,所以如果你只想要一个随机行:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum = 1
I'm using a subpartitioned table, and I'm getting pretty good randomness even grabbing multiple rows:
我正在使用一个子分区表,即使抓取多行,我也获得了很好的随机性:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum <= 5
FILENO BLOCKNO OFFSET
---------- ---------- ----------
152 2454936 11
152 2463140 32
152 2335208 2
152 2429207 23
152 2746125 28
I suspect you should probably tune your SAMPLE
clause to use an appropriate sample size for what you're fetching.
我怀疑您可能应该调整您的SAMPLE
条款以使用适当的样本量来获取您要获取的内容。
回答by Jeffrey Kemp
Start with Adam's answer first, but if SAMPLE
just isn't fast enough, even with the ROWNUM optimization, you can use block samples:
首先从 Adam 的答案开始,但如果SAMPLE
不够快,即使使用 ROWNUM 优化,您也可以使用块样本:
....FROM [table] SAMPLE BLOCK (0.01)
This applies the sampling at the block level instead of for each row. This does mean that it can skip large swathes of data from the table so the sample percent will be very rough. It's not unusual for a SAMPLE BLOCK with a low percentage to return zero rows.
这适用于块级别而不是每一行的采样。这确实意味着它可以从表中跳过大量数据,因此样本百分比将非常粗糙。具有低百分比的 SAMPLE BLOCK 返回零行并不罕见。
回答by Joseph Bui
Here's the same question on AskTom:
这是 AskTom 上的相同问题:
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522
If you know how big your table is, use sample block as described above. If you don't, you can modify the routine below to get however many rows you want.
如果您知道您的表有多大,请使用上述示例块。如果不这样做,您可以修改下面的例程以获取所需的行数。
Copied from: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861
复制自:http: //asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID: 6075151195522#56174726207861
create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
for exp_v in -6..2 loop
exit when (urowid_t.count > 0);
if (exp_v < 2) then
sql_v := 'select rowid from ' || table_name
|| ' sample block (' || power(10, exp_v) || ')';
else
sql_v := 'select rowid from ' || table_name;
end if;
cursor_v := dbms_sql.open_cursor;
dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
status_v := dbms_sql.execute(cursor_v);
loop
rows_v := dbms_sql.fetch_rows(cursor_v);
dbms_sql.column_value(cursor_v, 1, urowid_t);
exit when rows_v != 100;
end loop;
dbms_sql.close_cursor(cursor_v);
end loop;
if (urowid_t.count > 0) then
return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
end if;
return null;
exception when others then
if (dbms_sql.is_open(cursor_v)) then
dbms_sql.close_cursor(cursor_v);
end if;
raise;
end;
/
show errors
回答by fatih tekin
Below Solution to this question is not the exact answer but in many scenarios you try to select a row and try to use it for some purpose and then update its status with "used" or "done" so that you do not select it again.
下面这个问题的解决方案不是确切的答案,但在许多情况下,您尝试选择一行并尝试将其用于某些目的,然后使用“已使用”或“完成”更新其状态,以便您不会再次选择它。
Solution:
解决方案:
Below query is useful but that way if your table is large, I just tried and see that you definitely face performance problem with this query.
下面的查询很有用,但是如果您的表很大,我只是尝试了一下,发现您肯定会遇到此查询的性能问题。
SELECT * FROM ( SELECT * FROM table ORDER BY dbms_random.value ) WHERE rownum = 1
SELECT * FROM ( SELECT * FROM table ORDER BY dbms_random.value ) WHERE rownum = 1
So if you set a rownum like below then you can work around the performance problem. By incrementing rownum you can reduce the possiblities. But in this case you will always get rows from the same 1000 rows. If you get a row from 1000 and update its status with "USED", you will almost get different row everytime you query with "ACTIVE"
因此,如果您设置如下所示的 rownum,则可以解决性能问题。通过增加 rownum 可以减少可能性。但在这种情况下,您将始终从相同的 1000 行中获取行。如果您从 1000 中获得一行并使用“USED”更新其状态,则每次使用“ACTIVE”查询时几乎都会得到不同的行
SELECT * FROM
( SELECT * FROM table
where rownum < 1000
and status = 'ACTIVE'
ORDER BY dbms_random.value )
WHERE rownum = 1
update the rows status after selecting it, If you can not update that means another transaction has already used it. Then You should try to get a new row and update its status. By the way, getting the same row by two different transaction possibility is 0.001 since rownum is 1000.
选择后更新行状态,如果您无法更新,则表示另一个事务已经使用了它。然后您应该尝试获取新行并更新其状态。顺便说一下,由于 rownum 是 1000,因此通过两个不同的事务获得同一行的可能性是 0.001。
回答by ?ilvinas
Someone told sample(x) is the fastest way you can. But for me this method works slightly faster than sample(x) method. It should take fraction of the second (0.2 in my case) no matter what is the size of the table. If it takes longer try to use hints (--+ leading(e) use_nl(e t) rowid(t)) can help
有人告诉 sample(x) 是最快的方法。但对我来说,这种方法比 sample(x) 方法稍微快一点。无论表的大小如何,它都应该是秒的一小部分(在我的情况下为 0.2)。如果需要更长的时间尝试使用提示 (--+leading(e) use_nl(et) rowid(t)) 可以提供帮助
SELECT *
FROM My_User.My_Table
WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN My_User.My_Table t
ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))
回答by ?ilvinas
Version with retries when no rows returned:
没有行返回时重试的版本:
WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
FROM Dual
LEFT JOIN gen ON 1=1
UNION ALL
SELECT Cnt + 1, gen.Row_Id
FROM Retries
LEFT JOIN gen ON 1=1
WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
FROM MY_USER.MY_TABLE
WHERE ROWID = (SELECT Row_Id
FROM Retries
WHERE Row_Id IS NOT NULL)
回答by erik80
Can you use pseudorandom rows?
你可以使用伪随机行吗?
select * from (
select * from ... where... order by ora_hash(rowid)
) where rownum<100