SQL Oracle中将字符串拆分为多行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14328621/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 13:08:43  来源:igfitidea点击:

Splitting string into multiple rows in Oracle

sqlstringoracleplsqltokenize

提问by marshalllaw

I know this has been answered to some degree with PHP and MYSQL, but I was wondering if someone could teach me the simplest approach to splitting a string (comma delimited) into multiple rows in Oracle 10g (preferably) and 11g.

我知道 PHP 和 MYSQL 在某种程度上已经解决了这个问题,但我想知道是否有人可以教我在 Oracle 10g(最好)和 11g 中将字符串(逗号分隔)拆分为多行的最简单方法。

The table is as follows:

表格如下:

Name | Project | Error 
108    test      Err1, Err2, Err3
109    test2     Err1

I want to create the following:

我想创建以下内容:

Name | Project | Error
108    Test      Err1
108    Test      Err2 
108    Test      Err3 
109    Test2     Err1

I've seen a few potential solutions around stack, however they only accounted for a single column (being the comma delimited string). Any help would be greatly appreciated.

我已经看到了一些围绕堆栈的潜在解决方案,但是它们只占一列(以逗号分隔的字符串)。任何帮助将不胜感激。

回答by Nefreo

This may be an improved way (also with regexp and connect by):

这可能是一种改进的方式(也使用正则表达式和连接方式):

with temp as
(
    select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual
    union all
    select 109, 'test2', 'Err1' from dual
)
select distinct
  t.name, t.project,
  trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))  as error
from 
  temp t,
  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
order by name

EDIT: Here is a simple (as in, "not in depth") explanation of the query.

编辑:这是查询的简单(如“不深入”)解释。

  1. length (regexp_replace(t.error, '[^,]+')) + 1uses regexp_replaceto erase anything that is not the delimiter (comma in this case) and length +1to get how many elements (errors) are there.
  2. The select level from dual connect by level <= (...)uses a hierarchical queryto create a column with an increasing number of matches found, from 1 to the total number of errors.

    Preview:

    select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1 as max 
    from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1
    
  3. table(cast(multiset(.....) as sys.OdciNumberList))does some casting of oracle types.
    • The cast(multiset(.....)) as sys.OdciNumberListtransforms multiple collections (one collection for each row in the original data set) into a single collection of numbers, OdciNumberList.
    • The table()function transforms a collection into a resultset.
  4. FROMwithout a join creates a cross joinbetween your dataset and the multiset. As a result, a row in the data set with 4 matches will repeat 4 times (with an increasing number in the column named "column_value").

    Preview:

    select * from 
    temp t,
    table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
    
  5. trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))uses the column_valueas the nth_appearance/ocurrenceparameter for regexp_substr.
  6. You can add some other columns from your data set (t.name, t.projectas an example) for easy visualization.
  1. length (regexp_replace(t.error, '[^,]+')) + 1用于regexp_replace擦除不是分隔符(在这种情况下为逗号)的任何内容并length +1获取有多少元素(错误)。
  2. select level from dual connect by level <= (...)使用分层查询以创建一个列有越来越多的比赛中,从1到错误的总数。

    预览:

    select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1 as max 
    from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1
    
  3. table(cast(multiset(.....) as sys.OdciNumberList))做一些预言机类型的铸造。
    • cast(multiset(.....)) as sys.OdciNumberList变换多个集合(一个收集在原始数据集中的每一行)转换成数字的单个集合,OdciNumberList。
    • table()函数将集合转换为结果集。
  4. FROM没有连接会在您的数据集和多重集之间创建交叉连接。因此,数据集中有 4 个匹配项的行将重复 4 次(名为“column_value”的列中的数字增加)。

    预览:

    select * from 
    temp t,
    table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
    
  5. trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))column_value用作 的nth_appearance/ocurrence参数regexp_substr
  6. 您可以从您的数据集中添加一些其他列(t.name, t.project例如)以方便可视化。

Some references to Oracle docs:

对 Oracle 文档的一些参考:

回答by Andrey Khmelev

regular expressions is a wonderful thing :)

正则表达式是一件很棒的事情:)

with temp as  (
       select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual
       union all
       select 109, 'test2', 'Err1' from dual
     )

SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str
  FROM (SELECT Name, Project, Error str FROM temp) t
CONNECT BY instr(str, ',', 1, level - 1) > 0
order by Name

回答by Lalit Kumar B

There is a huge difference between the below two:

以下两者之间存在巨大差异:

  • splitting a single delimited string
  • splitting delimited strings for multiple rows in a table.
  • 拆分单个分隔字符串
  • 拆分表中多行的分隔字符串。

If you do not restrict the rows, then the CONNECT BYclause would produce multiple rowsand will not give the desired output.

如果您不限制行,则CONNECT BY子句将生成多行并且不会提供所需的输出。

Apart from Regular Expressions, a few other alternatives are using:

除了正则表达式之外,还有其他一些替代方案正在使用:

  • XMLTable
  • MODELclause
  • XML表
  • 模型子句

Setup

设置

SQL> CREATE TABLE t (
  2    ID          NUMBER GENERATED ALWAYS AS IDENTITY,
  3    text        VARCHAR2(100)
  4  );

Table created.

SQL>
SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');

1 row created.

SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');

1 row created.

SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');

1 row created.

SQL> COMMIT;

Commit complete.

SQL>
SQL> SELECT * FROM t;

        ID TEXT
---------- ----------------------------------------------
         1 word1, word2, word3
         2 word4, word5, word6
         3 word7, word8, word9

SQL>

Using XMLTABLE:

使用XMLTABLE

SQL> SELECT id,
  2         trim(COLUMN_VALUE) text
  3  FROM t,
  4    xmltable(('"'
  5    || REPLACE(text, ',', '","')
  6    || '"'))
  7  /

        ID TEXT
---------- ------------------------
         1 word1
         1 word2
         1 word3
         2 word4
         2 word5
         2 word6
         3 word7
         3 word8
         3 word9

9 rows selected.

SQL>

Using MODELclause:

使用MODEL子句:

SQL> WITH
  2  model_param AS
  3     (
  4            SELECT id,
  5                      text AS orig_str ,
  6                   ','
  7                          || text
  8                          || ','                                 AS mod_str ,
  9                   1                                             AS start_pos ,
 10                   Length(text)                                   AS end_pos ,
 11                   (Length(text) - Length(Replace(text, ','))) + 1 AS element_count ,
 12                   0                                             AS element_no ,
 13                   ROWNUM                                        AS rn
 14            FROM   t )
 15     SELECT   id,
 16              trim(Substr(mod_str, start_pos, end_pos-start_pos)) text
 17     FROM     (
 18                     SELECT *
 19                     FROM   model_param MODEL PARTITION BY (id, rn, orig_str, mod_str)
 20                     DIMENSION BY (element_no)
 21                     MEASURES (start_pos, end_pos, element_count)
 22                     RULES ITERATE (2000)
 23                     UNTIL (ITERATION_NUMBER+1 = element_count[0])
 24                     ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1,
 25                     end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) )
 26                 )
 27     WHERE    element_no != 0
 28     ORDER BY mod_str ,
 29           element_no
 30  /

        ID TEXT
---------- --------------------------------------------------
         1 word1
         1 word2
         1 word3
         2 word4
         2 word5
         2 word6
         3 word7
         3 word8
         3 word9

9 rows selected.

SQL>

回答by Art

A couple of more examples of the same:

再举几个相同的例子:

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
  FROM dual
CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1
/

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
  FROM dual
CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1
/

Also, may use DBMS_UTILITY.comma_to_table & table_to_comma: http://www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table

另外,可以使用 DBMS_UTILITY.comma_to_table & table_to_comma:http: //www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table

回答by Daniela Petruzalek

I would like to propose a different approach using a PIPELINED table function. It's somewhat similar to the technique of the XMLTABLE, except that you are providing your own custom function to split the character string:

我想提出一种使用 PIPELINED 表函数的不同方法。它有点类似于 XMLTABLE 的技术,除了您提供自己的自定义函数来拆分字符串:

-- Create a collection type to hold the results
CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30);
/

-- Split the string according to the specified delimiter
CREATE OR REPLACE FUNCTION str2tbl (
  p_string    VARCHAR2,
  p_delimiter CHAR DEFAULT ',' 
)
RETURN typ_str2tbl_nst PIPELINED
AS
  l_tmp VARCHAR2(32000) := p_string || p_delimiter;
  l_pos NUMBER;
BEGIN
  LOOP
    l_pos := INSTR( l_tmp, p_delimiter );
    EXIT WHEN NVL( l_pos, 0 ) = 0;
    PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) );
    l_tmp := SUBSTR( l_tmp, l_pos+1 );
  END LOOP;
END str2tbl;
/

-- The problem solution
SELECT name, 
       project, 
       TRIM(COLUMN_VALUE) error
  FROM t, TABLE(str2tbl(error));

Results:

结果:

      NAME PROJECT    ERROR
---------- ---------- --------------------
       108 test       Err1
       108 test       Err2
       108 test       Err3
       109 test2      Err1

The problem with this type of approach is that often the optimizer won't know the cardinality of the table function and it will have to make a guess. This could be potentialy harmful to your execution plans, so this solution can be extended to provide execution statistics for the optimizer.

这种方法的问题是优化器通常不知道表函数的基数,它必须进行猜测。这可能对您的执行计划有害,因此可以扩展此解决方案以为优化器提供执行统计信息。

You can see this optimizer estimate by running an EXPLAIN PLAN on the query above:

您可以通过对上面的查询运行 EXPLAIN PLAN 来查看此优化器估计值:

Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806

----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         | 16336 |   366K|    59   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |         | 16336 |   366K|    59   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | T       |     2 |    42 |     3   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| STR2TBL |  8168 | 16336 |    28   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Even though the collection has only 3 values, the optimizer estimated 8168 rows for it (default value). This may seem irrelevant at first, but it may be enough for the optimizer to decide for a sub-optimal plan.

即使集合只有 3 个值,优化器估计它有 8168 行(默认值)。乍一看这似乎无关紧要,但对于优化器来说,决定一个次优计划可能就足够了。

The solution is to use the optimizer extensions to provide statistics for the collection:

解决方案是使用优化器扩展为集合提供统计信息:

-- Create the optimizer interface to the str2tbl function
CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT (
  dummy NUMBER,

  STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
  RETURN NUMBER,

  STATIC FUNCTION ODCIStatsTableFunction ( p_function  IN  SYS.ODCIFuncInfo,
                                           p_stats     OUT SYS.ODCITabFuncStats,
                                           p_args      IN  SYS.ODCIArgDescList,
                                           p_string    IN  VARCHAR2,
                                           p_delimiter IN  CHAR DEFAULT ',' )
  RETURN NUMBER
);
/

-- Optimizer interface implementation
CREATE OR REPLACE TYPE BODY typ_str2tbl_stats
AS
  STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList )
  RETURN NUMBER
  AS
  BEGIN
    p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') );
    RETURN ODCIConst.SUCCESS;
  END ODCIGetInterfaces;

  -- This function is responsible for returning the cardinality estimate
  STATIC FUNCTION ODCIStatsTableFunction ( p_function  IN  SYS.ODCIFuncInfo,
                                           p_stats     OUT SYS.ODCITabFuncStats,
                                           p_args      IN  SYS.ODCIArgDescList,
                                           p_string    IN  VARCHAR2,
                                           p_delimiter IN  CHAR DEFAULT ',' )
  RETURN NUMBER
  AS
  BEGIN
    -- I'm using basically half the string lenght as an estimator for its cardinality
    p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) );
    RETURN ODCIConst.SUCCESS;
  END ODCIStatsTableFunction;

END;
/

-- Associate our optimizer extension with the PIPELINED function   
ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats;

Testing the resulting execution plan:

测试生成的执行计划:

Execution Plan
----------------------------------------------------------
Plan hash value: 2402555806

----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         |     1 |    23 |    59   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                      |         |     1 |    23 |    59   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | T       |     2 |    42 |     3   (0)| 00:00:01 |
|   3 |   COLLECTION ITERATOR PICKLER FETCH| STR2TBL |     1 |     2 |    28   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

As you can see the cardinality on the plan above is not the 8196 guessed value anymore. It's still not correct because we are passing a column instead of a string literal to the function.

如您所见,上述计划中的基数不再是 8196 猜测值。它仍然不正确,因为我们将一列而不是字符串文字传递给函数。

Some tweaking to the function code would be necessary to give a closer estimate in this particular case, but I think the overall concept is pretty much explained here.

在这种特殊情况下,有必要对函数代码进行一些调整以进行更接近的估计,但我认为这里对整体概念进行了充分解释。

The str2tbl function used in this answer was originally developed by Tom Kyte: https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:110612348061

此答案中使用的 str2tbl 函数最初由 Tom Kyte 开发:https://asktom.oracle.com/pls/asktom/f?p =100:11:0::::P11_QUESTION_ID:110612348061

The concept of associating statistics with object types can be further explored by reading this article: http://www.oracle-developer.net/display.php?id=427

可以通过阅读这篇文章进一步探讨将统计信息与对象类型相关联的概念:http: //www.oracle-developer.net/display.php?id=427

The technique described here works in 10g+.

此处描述的技术适用于 10g+。

回答by Lukasz Szozda

Starting from Oracle 12c you could use JSON_TABLEand JSON_ARRAY:

从 Oracle 12c 开始,您可以使用JSON_TABLEJSON_ARRAY

CREATE TABLE tab(Name, Project, Error) AS
SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION 
SELECT 109,'test2','Err1'             FROM dual;

And query:

并查询:

SELECT *
FROM tab t
OUTER APPLY (SELECT TRIM(p) AS p
            FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'),
           '$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s;

Output:

输出:

┌──────┬─────────┬──────────────────┬──────┐
│ Name │ Project │      Error       │  P   │
├──────┼─────────┼──────────────────┼──────┤
│  108 │ test    │ Err1, Err2, Err3 │ Err1 │
│  108 │ test    │ Err1, Err2, Err3 │ Err2 │
│  108 │ test    │ Err1, Err2, Err3 │ Err3 │
│  109 │ test2   │ Err1             │ Err1 │
└──────┴─────────┴──────────────────┴──────┘

db<>fiddle demo

db<>小提琴演示

回答by durette

REGEXP_COUNT wasn't added until Oracle 11i. Here's an Oracle 10g solution, adopted from Art's solution.

直到 Oracle 11i 才添加 REGEXP_COUNT。这是从 Art 的解决方案中采用的 Oracle 10g 解决方案。

SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab
  FROM dual
CONNECT BY LEVEL <=
  LENGTH('Err1, Err2, Err3')
    - LENGTH(REPLACE('Err1, Err2, Err3', ',', ''))
    + 1;

回答by Thomas Tschernich

I'd like to add another method. This one uses recursive querys, something I haven't seen in the other answers. It is supported by Oracle since 11gR2.

我想添加另一种方法。这个使用递归查询,这是我在其他答案中没有看到的。Oracle 从 11gR2 开始支持它。

with cte0 as (
    select phone_number x
    from hr.employees
), cte1(xstr,xrest,xremoved) as (
        select x, x, null
        from cte0
    union all        
        select xstr,
            case when instr(xrest,'.') = 0 then null else substr(xrest,instr(xrest,'.')+1) end,
            case when instr(xrest,'.') = 0 then xrest else substr(xrest,1,instr(xrest,'.') - 1) end
        from cte1
        where xrest is not null
)
select xstr, xremoved from cte1  
where xremoved is not null
order by xstr

It is quite flexible with the splitting character. Simply change it in the INSTRcalls.

它具有拆分字符非常灵活。只需在INSTR调用中更改它。

回答by Ilya Kharlamov

Without using connect byor regexp:

不使用connect byregexp

    with mytable as (
      select 108 name, 'test' project, 'Err1,Err2,Err3' error from dual
      union all
      select 109, 'test2', 'Err1' from dual
    )
    ,x as (
      select name
      ,project
      ,','||error||',' error
      from mytable
    )
    ,iter as (SELECT rownum AS pos
        FROM all_objects
    )
    select x.name,x.project
    ,SUBSTR(x.error
      ,INSTR(x.error, ',', 1, iter.pos) + 1
      ,INSTR(x.error, ',', 1, iter.pos + 1)-INSTR(x.error, ',', 1, iter.pos)-1
    ) error
    from x, iter
    where iter.pos < = (LENGTH(x.error) - LENGTH(REPLACE(x.error, ','))) - 1;

回答by silentsurfer

Here is an alternative implementation using XMLTABLE that allows for casting to different data types:

这是使用 XMLTABLE 的替代实现,它允许转换为不同的数据类型:

select 
  xmltab.txt
from xmltable(
  'for $text in tokenize("a,b,c", ",") return $text'
  columns 
    txt varchar2(4000) path '.'
) xmltab
;

... or if your delimited strings are stored in one or more rows of a table:

...或者如果您的分隔字符串存储在表的一行或多行中:

select 
  xmltab.txt
from (
  select 'a;b;c' inpt from dual union all
  select 'd;e;f' from dual
) base
inner join xmltable(
  'for $text in tokenize($input, ";") return $text'
  passing base.inpt as "input"
  columns 
    txt varchar2(4000) path '.'
) xmltab
  on 1=1
;