SQL 如何将逗号分隔的值转换为oracle中的行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38371989/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert comma separated values to rows in oracle?
提问by Samiul Al Hossaini
Here is the DDL --
这是 DDL——
create table tbl1 (
id number,
value varchar2(50)
);
insert into tbl1 values (1, 'AA, UT, BT, SK, SX');
insert into tbl1 values (2, 'AA, UT, SX');
insert into tbl1 values (3, 'UT, SK, SX, ZF');
Notice, here value is comma separatedstring.
注意,这里的值是逗号分隔的字符串。
But, we need result like following-
但是,我们需要如下结果-
ID VALUE
-------------
1 AA
1 UT
1 BT
1 SK
1 SX
2 AA
2 UT
2 SX
3 UT
3 SK
3 SX
3 ZF
How do we write SQL for this?
我们如何为此编写 SQL?
回答by vercelli
I agree that this is a really bad design. Try this if you can't change that design:
我同意这是一个非常糟糕的设计。如果您无法更改该设计,请尝试以下操作:
select distinct id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
from tbl1
connect by regexp_substr(value, '[^,]+', 1, level) is not null
order by id, level;
OUPUT
输出
id value level
1 AA 1
1 UT 2
1 BT 3
1 SK 4
1 SX 5
2 AA 1
2 UT 2
2 SX 3
3 UT 1
3 SK 2
3 SX 3
3 ZF 4
Credits to this
归功于此
To remove duplicates in a more elegant and efficient way (credits to @mathguy)
以更优雅和有效的方式删除重复项(感谢@mathguy)
select id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
from tbl1
connect by regexp_substr(value, '[^,]+', 1, level) is not null
and PRIOR id = id
and PRIOR SYS_GUID() is not null
order by id, level;
If you want an "ANSIer" approach go with a CTE:
如果您想要“ANSIer”方法,请使用 CTE:
with t (id,res,val,lev) as (
select id, trim(regexp_substr(value,'[^,]+', 1, 1 )) res, value as val, 1 as lev
from tbl1
where regexp_substr(value, '[^,]+', 1, 1) is not null
union all
select id, trim(regexp_substr(val,'[^,]+', 1, lev+1) ) res, val, lev+1 as lev
from t
where regexp_substr(val, '[^,]+', 1, lev+1) is not null
)
select id, res,lev
from t
order by id, lev;
OUTPUT
输出
id val lev
1 AA 1
1 UT 2
1 BT 3
1 SK 4
1 SX 5
2 AA 1
2 UT 2
2 SX 3
3 UT 1
3 SK 2
3 SX 3
3 ZF 4
Another recursive approach by MT0 but without regex:
MT0 的另一种递归方法,但没有正则表达式:
WITH t ( id, value, start_pos, end_pos ) AS
( SELECT id, value, 1, INSTR( value, ',' ) FROM tbl1
UNION ALL
SELECT id,
value,
end_pos + 1,
INSTR( value, ',', end_pos + 1 )
FROM t
WHERE end_pos > 0
)
SELECT id,
SUBSTR( value, start_pos, DECODE( end_pos, 0, LENGTH( value ) + 1, end_pos ) - start_pos ) AS value
FROM t
ORDER BY id,
start_pos;
I've tried 3 approaches with a 30000 rows dataset and 118104 rows returned and got the following average results:
我尝试了 3 种方法,使用 30000 行数据集和 118104 行返回,并得到以下平均结果:
- My recursive approach: 5 seconds
- MT0 approach: 4 seconds
- Mathguy approach: 16 seconds
- MT0 recursive approach no-regex: 3.45 seconds
- 我的递归方法:5 秒
- MT0 进场:4 秒
- Mathguy 方法:16 秒
- MT0 递归方法无正则表达式:3.45 秒
@Mathguy has also tested with a bigger dataset:
@Mathguy 还使用更大的数据集进行了测试:
In all cases the recursive query (I only tested the one with regular substr and instr) does better, by a factor of 2 to 5. Here are the combinations of # of strings / tokens per string and CTAS execution times for hierarchical vs. recursive, hierarchical first. All times in seconds
在所有情况下,递归查询(我只测试了带有常规 substr 和 instr 的查询)效果更好,提高了 2 到 5 倍。以下是每个字符串的字符串数/标记数和分层与递归的 CTAS 执行时间的组合,层次第一。所有时间以秒为单位
- 30,000 x 4: 5 / 1.
- 30,000 x 10: 15 / 3.
- 30,000 x 25: 56 / 37.
- 5,000 x 50: 33 / 14.
- 5,000 x 100: 160 / 81.
- 10,000 x 200: 1,924 / 772
- 30,000 x 4:5 / 1。
- 30,000 x 10:15 / 3。
- 30,000 x 25:56 / 37。
- 5,000 x 50:33 / 14。
- 5,000 x 100:160 / 81。
- 10,000 x 200:1,924 / 772
回答by MT0
This will get the values without requiring you to remove duplicates or having to use a hack of including SYS_GUID()
or DBMS_RANDOM.VALUE()
in the CONNECT BY
:
这将获得值,而无需您删除重复项或必须使用 hackSYS_GUID()
或DBMS_RANDOM.VALUE()
in CONNECT BY
:
SELECT t.id,
v.COLUMN_VALUE AS value
FROM TBL1 t,
TABLE(
CAST(
MULTISET(
SELECT TRIM( REGEXP_SUBSTR( t.value, '[^,]+', 1, LEVEL ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '[^,]+' )
)
AS SYS.ODCIVARCHAR2LIST
)
) v
Update:
更新:
Returning the index of the element in the list:
返回列表中元素的索引:
Option 1 - Return a UDT:
选项 1 - 返回 UDT:
CREATE TYPE string_pair IS OBJECT( lvl INT, value VARCHAR2(4000) );
/
CREATE TYPE string_pair_table IS TABLE OF string_pair;
/
SELECT t.id,
v.*
FROM TBL1 t,
TABLE(
CAST(
MULTISET(
SELECT string_pair( level, TRIM( REGEXP_SUBSTR( t.value, '[^,]+', 1, LEVEL ) ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '[^,]+' )
)
AS string_pair_table
)
) v;
Option 2 - Use ROW_NUMBER()
:
选项 2 - 使用ROW_NUMBER()
:
SELECT t.id,
v.COLUMN_VALUE AS value,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY ROWNUM ) AS lvl
FROM TBL1 t,
TABLE(
CAST(
MULTISET(
SELECT TRIM( REGEXP_SUBSTR( t.value, '[^,]+', 1, LEVEL ) )
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '[^,]+' )
)
AS SYS.ODCIVARCHAR2LIST
)
) v;
回答by mathguy
Vercelli posted a correct answer. However, with more than one string to split, connect by
will generate an exponentially-growing number of rows, with many, many duplicates. (Just try the query without distinct
.) This will destroy performance on data of non-trivial size.
Vercelli 发布了正确答案。但是,要拆分的字符串不止一个,connect by
将生成呈指数增长的行数,其中包含许多重复项。(只需尝试不带 的查询distinct
。)这将破坏非平凡大小的数据的性能。
One common way to overcome this problem is to use a prior
condition and an additional check to avoid cycles in the hierarchy. Like so:
克服此问题的一种常见方法是使用prior
条件和附加检查来避免层次结构中的循环。像这样:
select id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
from tbl1
connect by regexp_substr(value, '[^,]+', 1, level) is not null
and prior id = id
and prior sys_guid() is not null
order by id, level;
See, for example, this discussion on OTN: https://community.oracle.com/thread/2526535
例如,参见关于 OTN 的讨论:https: //community.oracle.com/thread/2526535
回答by MT0
An alternate method is to define a simple PL/SQL function:
另一种方法是定义一个简单的 PL/SQL 函数:
CREATE OR REPLACE FUNCTION split_String(
i_str IN VARCHAR2,
i_delim IN VARCHAR2 DEFAULT ','
) RETURN SYS.ODCIVARCHAR2LIST DETERMINISTIC
AS
p_result SYS.ODCIVARCHAR2LIST := SYS.ODCIVARCHAR2LIST();
p_start NUMBER(5) := 1;
p_end NUMBER(5);
c_len CONSTANT NUMBER(5) := LENGTH( i_str );
c_ld CONSTANT NUMBER(5) := LENGTH( i_delim );
BEGIN
IF c_len > 0 THEN
p_end := INSTR( i_str, i_delim, p_start );
WHILE p_end > 0 LOOP
p_result.EXTEND;
p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, p_end - p_start );
p_start := p_end + c_ld;
p_end := INSTR( i_str, i_delim, p_start );
END LOOP;
IF p_start <= c_len + 1 THEN
p_result.EXTEND;
p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, c_len - p_start + 1 );
END IF;
END IF;
RETURN p_result;
END;
/
Then the SQL becomes very simple:
那么SQL就变得很简单了:
SELECT t.id,
v.column_value AS value
FROM TBL1 t,
TABLE( split_String( t.value ) ) v