在包含在另一个字段 Oracle SQL 中的一个字段中查找文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24539720/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 02:24:29  来源:igfitidea点击:

Look for text in one field contained in another field Oracle SQL

sqloracletextsql-like

提问by GrubBumble

I apologize if this question has been asked, I'm having trouble putting it into words.

如果有人问过这个问题,我很抱歉,我很难用语言表达。

I've been asked to filter out rows in a query where text from one field is contained in another field. An example would probably explain it better:

我被要求在查询中过滤掉一个字段中的文本包含在另一个字段中的行。一个例子可能会更好地解释它:

    Column_1         Column_2
    Low Static       Static
    Static           Static
    Static           Clear
                     Static
    Very Low Freq    Freq

The result of the query should return only rows 3 and 4, since rows 1, 2, and 5 contain strings that are similar. Right now, I have the following condition:

查询结果应仅返回第 3 行和第 4 行,因为第 1、2 和 5 行包含相似的字符串。现在,我有以下条件:

    WHERE
    ((Column_2 NOT LIKE '%' || Column_1 || '%')
    OR (Column_1 NOT LIKE '%' || Column_2 || '%' OR Column_1 IS NULL))

However, it's returning rows 1, 3, 4, and 5 when I want to only return rows 3 and 4. This is just example data, my actual dataset contains many different text strings in columns 1 and 2, so I can't just write specific case statements to exlcude certain instances where the columns are similar.

但是,当我只想返回第 3 行和第 4 行时,它会返回第 1、3、4 和 5 行。这只是示例数据,我的实际数据集在第 1 列和第 2 列中包含许多不同的文本字符串,所以我不能只是编写特定的 case 语句以排除某些列相似的实例。

Maybe this just isn't possible, since I'm unable to define a string as something contained within 2 spaces, while at the same time taking into consideration cases where there are no spaces?

也许这是不可能的,因为我无法将字符串定义为包含在 2 个空格内的内容,同时考虑到没有空格的情况?

Thanks

谢谢

采纳答案by Gordon Linoff

For your expression, I think you want andrather than or:

对于你的表达,我认为你想要and而不是or

WHERE ((Column_2 NOT LIKE '%' || Column_1 || '%') AND
       (Column_1 NOT LIKE '%' || Column_2 || '%' OR Column_1 IS NULL)
      )

You need for both conditions to be true. You might find the logic easier to follow as:

您需要同时满足这两个条件。您可能会发现以下逻辑更易于遵循:

WHERE NOT (Column_2 LIKE '%' || Column_1 || '%' OR
           Column_1 LIKE '%' || Column_2 || '%'
          )

回答by codenheim

The approach you are going with is going to do full table scans so it wont scale as the table grows. If you want to implement a more efficient solution (without using Oracle large text indexing) that will use an index, use a function based index to pre-calculate the columns common substrings.

您采用的方法是进行全表扫描,因此它不会随着表的增长而扩展。如果要实现使用索引的更有效的解决方案(不使用 Oracle 大文本索引),请使用基于函数的索引来预先计算列的公共子字符串。

Using INSTR() you can find whether a column is a substring of another column, and return a score for that. 0 means no match.

使用 INSTR() 可以找到一列是否是另一列的子字符串,并返回一个分数。0 表示不匹配。

create index ix_t_score on t (instr(nvl(column_1,' '), nvl(column_2, ' ')),
                              instr(nvl(column_2,' '), nvl(column_1, ' ')));

Now write the query such that it allows Oracle to use the indexes.

现在编写查询以允许 Oracle 使用索引。

-- Find rows that don't have common strings
select * from t
  where instr(nvl(column_1, ' '), nvl(column_2, ' ')) = 0 and
        instr(nvl(column_2, ' '), nvl(column_1, ' ')) = 0;

-- Find rows that do
select * from t
  where instr(nvl(column_1, ' '), nvl(column_2, ' ')) > 0 or
        instr(nvl(column_2, ' '), nvl(column_1, ' ')) > 0;


set autotrace on


Execution Plan
----------------------------------------------------------
Plan hash value: 4100696360

---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |     1 |    22 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |             |     1 |    22 |            |          |
|*  2 |   INDEX RANGE SCAN| IX_T_SCORE  |     1 |    22 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(INSTR(NVL("COLUMN_1",' '),NVL("COLUMN_2",' '))=0 AND
              INSTR(NVL("COLUMN_2",' '),NVL("COLUMN_1",' '))=0)

You can simplify it by creating a deterministic stored procedure / function to return a score, and the SQL becomes much simpler than the above. The use of NVL() is to take care of columns with nulls.

你可以通过创建一个确定性的存储过程/函数来返回一个分数来简化它,SQL变得比上面的简单得多。NVL() 的用途是处理带有空值的列。