SQL:如何将 Oracle 表中的 100,000 条记录拆分为 5 个块?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36335406/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL: How would you split a 100,000 records from a Oracle table into 5 chunks?
提问by Shaun Kinnair
I'm trying to figure out away to split the first 100,000 records from a table that has 1 million+ records into 5 (five) 20,000 records chunks to go into a file? Maybe some SQL that will get the min and max rowid or primary id for each 5 chunks of 20,000 records, so I can put the min and max value into a variable and pass it into the SQL and use a BETWEEN in the where clause to the SQL.
我试图找出将具有 100 万多条记录的表中的前 100,000 条记录拆分为 5(五)条 20,000 条记录块以放入一个文件?也许某些 SQL 会为每 5 个 20,000 条记录块获取最小和最大 rowid 或主 ID,因此我可以将最小和最大值放入变量并将其传递到 SQL 并在 where 子句中使用 BETWEEN 到SQL。
Can this be done?
这能做到吗?
I'm on an Oracle 11g database.
我在 Oracle 11g 数据库上。
Thanks in advance.
提前致谢。
回答by Gordon Linoff
If you just want to assign values 1-5 to basically equal sized groups, then use ntile()
:
如果您只想将值 1-5 分配给大小基本相同的组,请使用ntile()
:
select t.*, ntile(5) over (order by NULL) as num
from (select t.*
from t
where rownum <= 100000
) t;
If you want to insert into 5 different tables, then use insert all
:
如果要插入 5 个不同的表,请使用insert all
:
insert all
when num = 1 then into t1
when num = 2 then into t2
when num = 3 then into t3
when num = 4 then into t4
when num = 5 then into t5
select t.*, ntile(5) over (order by NULL) as num
from (select t.*
from t
where rownum <= 100000
) t;
回答by Shaun Kinnair
Thanks so much to Gordon Linoff for giving me a starter to the code.
非常感谢Gordon Linoff 为我提供了代码入门。
just an update on how to get the min and max values for 5 chunks.
只是关于如何获取 5 个块的最小值和最大值的更新。
select num, min(cre_surr_id), max(cre_surr_id)
from
(select p.cre_surr_id, ntile(5) over (order by NULL) as num
from (select p.*
from productions p
where rownum <= 100000
) p )
group by num
回答by Aleksej
You can even try with simple aggregation:
您甚至可以尝试使用简单的聚合:
create table test_chunk(val) as
(
select floor(dbms_random.value(1, level * 10)) from dual
connect by level <= 100
)
select min(val), max(val), floor((num+1)/2)
from (select rownum as num, val from test_chunk)
group by floor((num+1)/2)
回答by TenG
A bit harsh down voting another fair question.
投票另一个公平的问题有点苛刻。
Anyway, NTILE is new to me, so I wouldn't have discovered that were it not for your question.
无论如何,NTILE 对我来说是新的,所以如果不是你的问题,我不会发现它。
My way of doing this , the old school way, would have been to MOD the rownum to get the group number, e.g.
我这样做的方式,老派的方式,本来是修改 rownum 来获取组号,例如
select t.*, mod(rn,5) as num
from (select t.*, rownnum rn
from t
) t;
This solves the SQL part, or rather how to group rows into equal chunks, but that is only half your question. The next half is how to write these to 5 separate files.
这解决了 SQL 部分,或者更确切地说如何将行分组为相等的块,但这只是您问题的一半。下半部分是如何将这些写入 5 个单独的文件。
You can either have 5 separate queries each spooling to a separate file, e.g:
您可以有 5 个单独的查询,每个查询都假脱机到一个单独的文件,例如:
spool f1.dat
select t.*
from (select t.*, rownnum rn
from t
) t
where mod(t.rn,5) = 0;
spool off
spool f2.dat
select t.*
from (select t.*, rownnum rn
from t
) t
where mod(t.rn,5) = 1;
spool off
etc.
等等。
Or, using UTL_FILE. You could try something clever with a single query and have an array of UTL_FILE types where the array index matches the MOD(rn,5) then you wouldn't need logic like "IF rn = 0 THEN UTL_FILE.WRITELN(f0, ...".
或者,使用 UTL_FILE。您可以使用单个查询尝试一些巧妙的方法,并拥有一个 UTL_FILE 类型的数组,其中数组索引与 MOD(rn,5) 匹配,那么您就不需要像“IF rn = 0 THEN UTL_FILE.WRITELN(f0, .. .”。
So, something like (not tested, just in a rough form for guidance, never tried this myself):
所以,像(未经测试,只是粗略的指导形式,我自己从未尝试过):
DECLARE
TYPE fname IS VARRAY(5) OF VARCHAR2(100);
TYPE fh IS VARRAY(5) OF UTL_FILE.FILE_TYPE;
CURSOR c1 IS
select t.*, mod(rn,5) as num
from (select t.*, rownnum rn
from t
) t;
idx INTEGER;
BEGIN
FOR idx IN 1..5 LOOP
fname(idx) := 'data_' || idx || '.dat';
fh(idx) := UTL_FILE.'THE_DIR', fname(idx), 'w');
END LOOP;
FOR r1 IN c1 LOOP
UTL_FILE.PUT_LINE ( fh(r1.num+1), r1.{column value from C1} );
END LOOP;
FOR idx IN 1..5 LOOP
UTL_FILE.FCLOSE (fh(idx));
END LOOP;
END;