SQL 如何将逗号分隔的值转换为oracle中的行？

Question

提问by Samiul Al Hossaini

Here is the DDL --

这是 DDL——

create table tbl1 (
   id number,
   value varchar2(50)
);

insert into tbl1 values (1, 'AA, UT, BT, SK, SX');
insert into tbl1 values (2, 'AA, UT, SX');
insert into tbl1 values (3, 'UT, SK, SX, ZF');

Notice, here value is comma separatedstring.

注意，这里的值是逗号分隔的字符串。

But, we need result like following-

但是，我们需要如下结果-

ID VALUE
-------------
1  AA
1  UT
1  BT
1  SK
1  SX
2  AA
2  UT
2  SX
3  UT
3  SK
3  SX
3  ZF

How do we write SQL for this?

我们如何为此编写 SQL？

Answer 1

回答by vercelli

I agree that this is a really bad design. Try this if you can't change that design:

我同意这是一个非常糟糕的设计。如果您无法更改该设计，请尝试以下操作：

select distinct id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
  from tbl1
   connect by regexp_substr(value, '[^,]+', 1, level) is not null
   order by id, level;

OUPUT

输出

id value level
1   AA  1
1   UT  2
1   BT  3
1   SK  4
1   SX  5
2   AA  1
2   UT  2
2   SX  3
3   UT  1
3   SK  2
3   SX  3
3   ZF  4

Credits to this

归功于此

To remove duplicates in a more elegant and efficient way (credits to @mathguy)

以更优雅和有效的方式删除重复项（感谢@mathguy）

select id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
  from tbl1
   connect by regexp_substr(value, '[^,]+', 1, level) is not null
      and PRIOR id =  id 
      and PRIOR SYS_GUID() is not null  
   order by id, level;

If you want an "ANSIer" approach go with a CTE:

如果您想要“ANSIer”方法，请使用 CTE：

with t (id,res,val,lev) as (
           select id, trim(regexp_substr(value,'[^,]+', 1, 1 )) res, value as val, 1 as lev
             from tbl1
            where regexp_substr(value, '[^,]+', 1, 1) is not null
            union all           
            select id, trim(regexp_substr(val,'[^,]+', 1, lev+1) ) res, val, lev+1 as lev
              from t
              where regexp_substr(val, '[^,]+', 1, lev+1) is not null
              )
select id, res,lev
  from t
order by id, lev;

OUTPUT

输出

id  val lev
1   AA  1
1   UT  2
1   BT  3
1   SK  4
1   SX  5
2   AA  1
2   UT  2
2   SX  3
3   UT  1
3   SK  2
3   SX  3
3   ZF  4

Another recursive approach by MT0 but without regex:

MT0 的另一种递归方法，但没有正则表达式：

WITH t ( id, value, start_pos, end_pos ) AS
  ( SELECT id, value, 1, INSTR( value, ',' ) FROM tbl1
  UNION ALL
  SELECT id,
    value,
    end_pos                    + 1,
    INSTR( value, ',', end_pos + 1 )
  FROM t
  WHERE end_pos > 0
  )
SELECT id,
  SUBSTR( value, start_pos, DECODE( end_pos, 0, LENGTH( value ) + 1, end_pos ) - start_pos ) AS value
FROM t
ORDER BY id,
  start_pos;

I've tried 3 approaches with a 30000 rows dataset and 118104 rows returned and got the following average results:

我尝试了 3 种方法，使用 30000 行数据集和 118104 行返回，并得到以下平均结果：

My recursive approach: 5 seconds
MT0 approach: 4 seconds
Mathguy approach: 16 seconds
MT0 recursive approach no-regex: 3.45 seconds

我的递归方法：5 秒
MT0 进场：4 秒
Mathguy 方法：16 秒
MT0 递归方法无正则表达式：3.45 秒

@Mathguy has also tested with a bigger dataset:

@Mathguy 还使用更大的数据集进行了测试：

In all cases the recursive query (I only tested the one with regular substr and instr) does better, by a factor of 2 to 5. Here are the combinations of # of strings / tokens per string and CTAS execution times for hierarchical vs. recursive, hierarchical first. All times in seconds

在所有情况下，递归查询（我只测试了带有常规 substr 和 instr 的查询）效果更好，提高了 2 到 5 倍。以下是每个字符串的字符串数/标记数和分层与递归的 CTAS 执行时间的组合，层次第一。所有时间以秒为单位

30,000 x 4: 5 / 1.
30,000 x 10: 15 / 3.
30,000 x 25: 56 / 37.
5,000 x 50: 33 / 14.
5,000 x 100: 160 / 81.
10,000 x 200: 1,924 / 772

30,000 x 4：5 / 1。
30,000 x 10：15 / 3。
30,000 x 25：56 / 37。
5,000 x 50：33 / 14。
5,000 x 100：160 / 81。
10,000 x 200：1,924 / 772

Answer 2

回答by MT0

This will get the values without requiring you to remove duplicates or having to use a hack of including SYS_GUID()or DBMS_RANDOM.VALUE()in the CONNECT BY:

这将获得值，而无需您删除重复项或必须使用 hackSYS_GUID()或DBMS_RANDOM.VALUE()in CONNECT BY：

SELECT t.id,
       v.COLUMN_VALUE AS value
FROM   TBL1 t,
       TABLE(
         CAST(
           MULTISET(
             SELECT TRIM( REGEXP_SUBSTR( t.value, '[^,]+', 1, LEVEL ) )
             FROM   DUAL
             CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '[^,]+' )
           )
           AS SYS.ODCIVARCHAR2LIST
         )
       ) v

Update:

更新：

Returning the index of the element in the list:

返回列表中元素的索引：

Option 1 - Return a UDT:

选项 1 - 返回 UDT：

CREATE TYPE string_pair IS OBJECT( lvl INT, value VARCHAR2(4000) );
/

CREATE TYPE string_pair_table IS TABLE OF string_pair;
/

SELECT t.id,
       v.*
FROM   TBL1 t,
       TABLE(
         CAST(
           MULTISET(
             SELECT string_pair( level, TRIM( REGEXP_SUBSTR( t.value, '[^,]+', 1, LEVEL ) ) )
             FROM   DUAL
             CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '[^,]+' )
           )
           AS string_pair_table
         )
       ) v;

Option 2 - Use ROW_NUMBER():

选项 2 - 使用ROW_NUMBER()：

SELECT t.id,
       v.COLUMN_VALUE AS value,
       ROW_NUMBER() OVER ( PARTITION BY id ORDER BY ROWNUM ) AS lvl
FROM   TBL1 t,
       TABLE(
         CAST(
           MULTISET(
             SELECT TRIM( REGEXP_SUBSTR( t.value, '[^,]+', 1, LEVEL ) )
             FROM   DUAL
             CONNECT BY LEVEL <= REGEXP_COUNT( t.value, '[^,]+' )
           )
           AS SYS.ODCIVARCHAR2LIST
         )
       ) v;

Answer 3

回答by mathguy

Vercelli posted a correct answer. However, with more than one string to split, connect bywill generate an exponentially-growing number of rows, with many, many duplicates. (Just try the query without distinct.) This will destroy performance on data of non-trivial size.

Vercelli 发布了正确答案。但是，要拆分的字符串不止一个，connect by将生成呈指数增长的行数，其中包含许多重复项。（只需尝试不带的查询distinct。）这将破坏非平凡大小的数据的性能。

One common way to overcome this problem is to use a priorcondition and an additional check to avoid cycles in the hierarchy. Like so:

克服此问题的一种常见方法是使用prior条件和附加检查来避免层次结构中的循环。像这样：

select id, trim(regexp_substr(value,'[^,]+', 1, level) ) value, level
  from tbl1
   connect by regexp_substr(value, '[^,]+', 1, level) is not null
          and prior id = id
          and prior sys_guid() is not null
   order by id, level;

See, for example, this discussion on OTN: https://community.oracle.com/thread/2526535

例如，参见关于 OTN 的讨论：https: //community.oracle.com/thread/2526535

Answer 4

回答by MT0

An alternate method is to define a simple PL/SQL function:

另一种方法是定义一个简单的 PL/SQL 函数：

CREATE OR REPLACE FUNCTION split_String(
  i_str    IN  VARCHAR2,
  i_delim  IN  VARCHAR2 DEFAULT ','
) RETURN SYS.ODCIVARCHAR2LIST DETERMINISTIC
AS
  p_result       SYS.ODCIVARCHAR2LIST := SYS.ODCIVARCHAR2LIST();
  p_start        NUMBER(5) := 1;
  p_end          NUMBER(5);
  c_len CONSTANT NUMBER(5) := LENGTH( i_str );
  c_ld  CONSTANT NUMBER(5) := LENGTH( i_delim );
BEGIN
  IF c_len > 0 THEN
    p_end := INSTR( i_str, i_delim, p_start );
    WHILE p_end > 0 LOOP
      p_result.EXTEND;
      p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, p_end - p_start );
      p_start := p_end + c_ld;
      p_end := INSTR( i_str, i_delim, p_start );
    END LOOP;
    IF p_start <= c_len + 1 THEN
      p_result.EXTEND;
      p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, c_len - p_start + 1 );
    END IF;
  END IF;
  RETURN p_result;
END;
/

Then the SQL becomes very simple:

那么SQL就变得很简单了：

SELECT t.id,
       v.column_value AS value
FROM   TBL1 t,
       TABLE( split_String( t.value ) ) v

SQL 如何将逗号分隔的值转换为oracle中的行？

提问by Samiul Al Hossaini

回答by vercelli

回答by MT0

回答by mathguy

回答by MT0

相关推荐

最近更新

标签

SQL 如何将逗号分隔的值转换为oracle中的行？

提问by Samiul Al Hossaini

回答by vercelli

回答by MT0

回答by mathguy

回答by MT0

相关推荐

SQL Server 错误：连接字符串属性尚未初始化

在 SQL Server 中增加整数

SQL 如何将值插入具有外键关系的两个表中？

SQL Server 2005:charindex 从末尾开始

相关推荐

最近更新

标签