Oracle - 需要在给定字符串之间提取文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28674778/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 02:45:06  来源:igfitidea点击:

Oracle - need to extract text between given strings

sqlregexoracleplsqlsubstring

提问by user1209216

Example - need to extract everything between "Begin begin" and "End end". I tried this way:

示例 - 需要提取“Begin begin”和“End end”之间的所有内容。我试过这种方式:

with phrases as (
  select 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!' as phrase
    from dual
         )
select regexp_replace(phrase
     , '([[:print:]]+Begin begin)([[:print:]]+)(End end[[:print:]]+)', '')
  from phrases
       ;

Result: Hello, World!

结果:你好,世界!

However it fails if my text contains new line characters. Any tip how to fix this to allow extracting text containing also new lines?

但是,如果我的文本包含换行符,它就会失败。任何提示如何解决这个问题以允许提取包含新行的文本?

[edit]How does it fail:

[编辑]它是如何失败的:

with phrases as (
  select 'stackoverflow is awesome. Begin beginHello, 
  World!End end It has everything!' as phrase
    from dual
         )
select regexp_replace(phrase
     , '([[:print:]]+Begin begin)([[:print:]]+)(End end[[:print:]]+)', '')
  from phrases
       ;

Result:

结果:

stackoverflow is awesome. Begin beginHello, World!End end It has everything!

stackoverflow 很棒。开始开始你好,世界!结束结束它拥有一切!

Should be:

应该:

Hello,
World!

你好,
世界!

[edit]

[编辑]

Another issue. Let's see to this sample:

另一个问题。让我们看看这个样本:

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!End endTESTESTESTES' AS phrase
    FROM dual
)
SELECT REGEXP_REPLACE(phrase, '.+Begin begin(.+)End end.+', '', 1, 1, 'n')
  FROM phrases;

Result:

结果:

Hello,
World!End end It has everything!

你好,
世界!尽头它拥有一切!

So it matches last occurence of end string and this is not what I want. Subsgtring should be extreacted to first occurence of my label, so result should be:

所以它匹配结束字符串的最后一次出现,这不是我想要的。Subsgstring 应该被提取到我的标签第一次出现,所以结果应该是:

Hello,
World!

你好,
世界!

Everything after first occurence of label string should be ignored. Any ideas?

标签字符串第一次出现后的所有内容都应该被忽略。有任何想法吗?

回答by David Faber

I'm not that familiar with the POSIX [[:print:]]character class but I got your query functioning using the wildcard .. You need to specify the nmatch parameter in REGEXP_REPLACE()so that .can match the newline character:

我对 POSIX[[:print:]]字符类不太熟悉,但我使用通配符让您的查询正常运行.。您需要在中指定n匹配参数,REGEXP_REPLACE()以便.可以匹配换行符:

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!' AS phrase
    FROM dual
)
SELECT REGEXP_REPLACE(phrase, '.+Begin begin(.+)End end.+', '', 1, 1, 'n')
  FROM phrases;

I used the \1backreference as I didn't see the need to capture the other groups from the regular expression. It might also be a good idea to use the *quantifier (instead of +) in case there is nothing preceding or following the delimiters.If you want to capture all of the groups then you can use the following:

我使用了\1反向引用,因为我认为不需要从正则表达式中捕获其他组。如果分隔符之前或之后没有任何内容,则使用*量词(而不是+)也可能是一个好主意。如果要捕获所有组,则可以使用以下命令:

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!' AS phrase
    FROM dual
)
SELECT REGEXP_REPLACE(phrase, '(.+Begin begin)(.+)(End end.+)', '', 1, 1, 'n')
  FROM phrases;

UPDATE- FYI, I tested with [[:print:]]and it doesn't work. This is not surprising since [[:print:]]is supposed to match printablecharacters. It doesn't match anything with an ASCII value below 32 (a space). You need to use ..

更新- 仅供参考,我测试过[[:print:]],但它不起作用。这并不奇怪,因为[[:print:]]它应该匹配可打印的字符。它与 ASCII 值低于 32(空格)的任何内容都不匹配。您需要使用..

UPDATE #2 - per update to question- I don't think a regex will work the way you want it to. Adding the lazy quantifier to (.+)has no effect and Oracle regular expressions don't have lookahead. There are a couple of things you might do, one is to use INSTR()and SUBSTR():

更新 #2 - 每次更新问题- 我认为正则表达式不会按照您希望的方式工作。添加惰性量词 to(.+)不起作用,Oracle 正则表达式没有前瞻。您可以做几件事,其中之一是使用INSTR()SUBSTR()

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!End endTESTTESTTEST' AS phrase
    FROM dual
)
SELECT SUBSTR(phrase, str_start, str_end - str_start) FROM (
    SELECT INSTR(phrase, 'Begin begin') + LENGTH('Begin begin') AS str_start
         , INSTR(phrase, 'End end') AS str_end, phrase
      FROM phrases
);

Another is to combine INSTR()and SUBSTR()with a regular expression:

另一种是结合INSTR()SUBSTR()正则表达式:

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!End endTESTTESTTEST' AS phrase
    FROM dual
)
SELECT REGEXP_REPLACE(SUBSTR(phrase, 1, INSTR(phrase, 'End end') + LENGTH('End end')), '.+Begin begin(.+)End end.+', '', 1, 1, 'n')
  FROM phrases;

回答by Stephan

Try this regex:

试试这个正则表达式:

([[:print:]]+Begin begin)(.+?)(End end[[:print:]]+)

Sample usage:

示例用法:

SELECT regexp_replace(
         phrase ,
         '([[:print:]]+Begin begin)(.+?)(End end[[:print:]]+)',
         '',
         1,  -- Start at the beginning of the phrase
         0,  -- Replace ALL occurences
         'n' -- Let dot meta character matches new line character
)
FROM
  (SELECT 'stackoverflow is awesome. Begin beginHello, '
    || chr(10)
    || ' World!End end It has everything!' AS phrase
  FROM DUAL
  )

The dot meta character (.) matches any character in the database character set and the new line character. However, when regexp_replace is called, the match_parameter must contain nswitch for dotmatches new lines.

点元字符 ( .) 匹配数据库字符集中的任何字符和换行符。但是,当调用 regexp_replace 时,match_parameter 必须包含n用于dot匹配新行的开关。

回答by Stephan

In order to get your second option to work you need to add [[:space:][:print:]]*as follows:

为了使您的第二个选项起作用,您需要添加[[:space:][:print:]]*如下内容:

with phrases as (
  select 'stackoverflow is awesome. Begin beginHello, 
  World!End end It has everything!' as phrase
    from dual
         )
select regexp_replace(phrase
     , '([[:print:]]+Begin begin)([[:print:]]+[[:space:][:print:]]*)(End end[[:print:]]+)', '')
  from phrases
       ;    

But still it will break if you have more \n, for instance it won't work for

但是如果你有更多它仍然会破裂\n,例如它不会为

with phrases as (
  select 'stackoverflow is awesome. Begin beginHello, 
  World!End end 
  It has everything!' as phrase
    from dual
         )
select regexp_replace(phrase
     , '([[:print:]]+Begin begin)([[:print:]]+[[:space:][:print:]]*)(End end[[:print:]]+)', '')
  from phrases
       ;       

Then you need to add

然后你需要添加

with phrases as (
  select 'stackoverflow is awesome. Begin beginHello, 
  World!End end 
  It has everything!' as phrase
    from dual
         )
select regexp_replace(phrase
     , '([[:print:]]+Begin begin)([[:print:]]+[[:space:][:print:]]*)(End end[[:print:]]+[[:space:][:print:]]*)', '')
  from phrases
       ;   

The problem of regex is that you might have to scope the variations and create a rule that match all of them. If something falls out of your scope, you'll have to visit the regex and add the new exception.

正则表达式的问题在于您可能必须确定变体的范围并创建匹配所有变体的规则。如果某些内容超出您的范围,您将必须访问正则表达式并添加新的异常。

You can find extra info here.

您可以在此处找到更多信息。

回答by Dan

 Description.........: This is a function similar to the one that was available from PRIME Computers
                       back in the late 80/90's.  This function will parse out a segment of a string
                       based on a supplied delimiter.  The delimiters can be anything.
Usage:
     Field(i_string     =>'This.is.a.cool.function'
          ,i_deliiter   => '.'
          ,i_start_pos  => 2
          ,i_occurrence => 2)

     Return value = is.a


FUNCTION field(i_string           VARCHAR2
              ,i_delimiter        VARCHAR2
              ,i_occurance        NUMBER DEFAULT 1
              ,i_return_instances NUMBER DEFAULT 1) RETURN VARCHAR2 IS
  --
  v_delimiter      VARCHAR2(1);
  n_end_pos        NUMBER;
  n_start_pos      NUMBER := 1;
  n_delimiter_pos  NUMBER;
  n_seek_pos       NUMBER := 1;
  n_tbl_index      PLS_INTEGER := 0;
  n_return_counter NUMBER := 0;
  v_return_string  VARCHAR2(32767);
  TYPE tbl_type IS TABLE OF VARCHAR2(4000) INDEX BY PLS_INTEGER;
  tbl tbl_type;
  e_no_delimiters EXCEPTION;
  v_string VARCHAR2(32767) := i_string || i_delimiter;
BEGIN
  BEGIN
    LOOP
      ----------------------------------------
      -- Search for the delimiter in the
      -- string
      ----------------------------------------
      n_delimiter_pos := instr(v_string, i_delimiter, n_seek_pos);
      --
      IF n_delimiter_pos = length(v_string) AND n_tbl_index = 0 THEN
        ------------------------------------------
        -- The delimiter you are looking for is
        -- not in this string.
        ------------------------------------------
        RAISE e_no_delimiters;
      END IF;
      --
      EXIT WHEN n_delimiter_pos = 0;
      n_start_pos := n_seek_pos;
      n_end_pos   := n_delimiter_pos - n_seek_pos;
      n_seek_pos  := n_delimiter_pos + 1;
      --
      n_tbl_index := n_tbl_index + 1;
      -----------------------------------------------
      -- Store the segments of the string in a tbl
      -----------------------------------------------
      tbl(n_tbl_index) := substr(i_string, n_start_pos, n_end_pos);
    END LOOP;
    ----------------------------------------------
    -- Prepare the results for return voyage
    ----------------------------------------------
    v_delimiter := NULL;
    FOR a IN tbl.first .. tbl.last LOOP
      IF a >= i_occurance AND n_return_counter < i_return_instances THEN
        v_return_string  := v_return_string || v_delimiter || tbl(a);
        v_delimiter      := i_delimiter;
        n_return_counter := n_return_counter + 1;
      END IF;
    END LOOP;
    --
  EXCEPTION
    WHEN e_no_delimiters THEN
      v_return_string := i_string;
  END;
  RETURN TRIM(v_return_string);
END;