SQL：查找列包含所有给定单词的行

Question

提问by veljkoz

I have some column EntityName, and I want to have users to be able to search names by entering words separated by space. The space is implicitly considered as an 'AND' operator, meaning that the returned rows must have all of the words specified, and not necessarily in the given order.

我有一些列 EntityName，我希望用户能够通过输入由空格分隔的单词来搜索名称。空格被隐式视为“AND”运算符，这意味着返回的行必须具有指定的所有单词，而不必按给定的顺序排列。

For example, if we have rows like these:

例如，如果我们有这样的行：

abba nina pretty balerina
acdc you shook me all night long
sth you are me
dream theater it's all about you

阿巴尼娜漂亮的芭蕾舞女演员
acdc 你整夜都在摇晃我
……你是我
梦想剧院，一切都与你有关

when the user enters: me you, or you me(the results must be equivalent), the result has rows 2 and 3.

当用户输入：me you，或you me（结果必须相等）时，结果有第 2 行和第 3 行。

I know I can go like:

我知道我可以这样：

WHERE Col1 LIKE '%' + word1 + '%'
  AND Col1 LIKE '%' + word2 + '%'

but I wanted to know if there's some more optimal solution.

但我想知道是否有一些更优化的解决方案。

The CONTAINSwould require a full text index, which (for various reasons) is not an option.

这CONTAINS将需要全文索引，这（出于各种原因）不是一种选择。

Maybe Sql2008 has some built-in, semi-hidden solution for these cases?

也许 Sql2008 对这些情况有一些内置的、半隐藏的解决方案？

Answer 1

采纳答案by RedFilter

The only thing I can think of is to write a CLRfunction that does the LIKEcomparisons. This should be many times faster.

我唯一能想到的就是编写一个进行比较的CLR函数LIKE。这应该快很多倍。

Update:Now that I think about it, it makes sense CLR would not help. Two other ideas:

更新：现在我想想，CLR 无济于事是有道理的。另外两个想法：

1 - Try indexing Col1 and do this:

1 - 尝试索引 Col1 并执行以下操作：

WHERE (Col1 LIKE word1 + '%' or Col1 LIKE '%' + word1 + '%')
  AND (Col1 LIKE word2 + '%' or Col1 LIKE '%' + word2 + '%')

Depending on the most common searches (starts with vs. substring), this may offer an improvement.

根据最常见的搜索（以与子字符串开头），这可能会提供改进。

2 - Add your own full text indexing table where each word is a row in the table. Then you can index properly.

2 - 添加您自己的全文索引表，其中每个单词都是表中的一行。然后你就可以正确索引了。

Answer 2

回答by Eduardo Cuomo

Function

功能

 CREATE FUNCTION [dbo].[fnSplit] ( @sep CHAR(1), @str VARCHAR(512) )
 RETURNS TABLE AS
 RETURN (
           WITH Pieces(pn, start, stop) AS (
           SELECT 1, 1, CHARINDEX(@sep, @str)
           UNION ALL
           SELECT pn + 1, stop + 1, CHARINDEX(@sep, @str, stop + 1)
           FROM Pieces
           WHERE stop > 0
      )

      SELECT
           pn AS Id,
           SUBSTRING(@str, start, CASE WHEN stop > 0 THEN stop - start ELSE 512 END) AS Data
      FROM
           Pieces
 )

Query

询问

 DECLARE @FilterTable TABLE (Data VARCHAR(512))

 INSERT INTO @FilterTable (Data)
 SELECT DISTINCT S.Data
 FROM fnSplit(' ', 'word1 word2 word3') S -- Contains words

 SELECT DISTINCT
      T.*
 FROM
      MyTable T
      INNER JOIN @FilterTable F1 ON T.Col1 LIKE '%' + F1.Data + '%'
      LEFT JOIN @FilterTable F2 ON T.Col1 NOT LIKE '%' + F2.Data + '%'
 WHERE
      F2.Data IS NULL

Source: SQL SELECT WHERE field contains words

来源：SQL SELECT WHERE 字段包含单词

Answer 3

回答by MK.

http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Answer 4

回答by Martin Smith

You're going to end up with a full table scan anyway.

无论如何，您最终都会进行全表扫描。

The collation can make a big difference apparently. Kalen Delaney in the book "Microsoft SQL Server 2008 Internals" says:

整理显然可以产生很大的不同。Kalen Delaney 在“ Microsoft SQL Server 2008 Internals”一书中说：

Collation can make a huge difference when SQL Server has to look at almost all characters in the strings. For instance, look at the following:
SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'
This may execute 10 times faster or more with a binary collation than a nonbinary Windows collation. And with varchardata, this executes up to seven or eight times faster with a SQL collation than with a Windows collation.

当 SQL Server 必须查看字符串中的几乎所有字符时，排序规则会产生巨大的差异。例如，请看以下内容：
SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'
与非二进制 Windows 排序规则相比，二进制排序规则的执行速度可能快 10 倍或更多。对于varchar数据，使用 SQL 归类的执行速度比使用 Windows 归类快 7 或 8 倍。

Answer 5

回答by A-K

WITH Tokens AS(SELECT 'you' AS Token UNION ALL SELECT 'me')
SELECT ...
FROM YourTable AS t
WHERE (SELECT COUNT(*) FROM Tokens WHERE y.Col1 LIKE '%'+Tokens.Token+'%') 
 = 
(SELECT COUNT(*) FROM Tokens) ;

Answer 6

回答by JBelfort

This should ideally be done with the help of Full text search as mentioned above. BUT, If you don't have full text configured for your DB, here is a performance intensive solution for doing a prioritized string search.

理想情况下，这应该在如上所述的全文搜索的帮助下完成。但是，如果你没有为你的数据库配置全文，这里是一个执行优先字符串搜索的性能密集型解决方案。

-- table to search in
drop table if exists dbo.myTable;
go
CREATE TABLE dbo.myTable
    (
    myTableId int NOT NULL IDENTITY (1, 1),
    code varchar(200) NOT NULL, 
    description varchar(200) NOT NULL -- this column contains the values we are going to search in 
    )  ON [PRIMARY]
GO

-- function to split space separated search string into individual words
drop function if exists [dbo].[fnSplit];
go
CREATE FUNCTION [dbo].[fnSplit] (@StringInput nvarchar(max),
@Delimiter nvarchar(1))
RETURNS @OutputTable TABLE (
  id nvarchar(1000)
)
AS
BEGIN
  DECLARE @String nvarchar(100);

  WHILE LEN(@StringInput) > 0
  BEGIN
    SET @String = LEFT(@StringInput, ISNULL(NULLIF(CHARINDEX(@Delimiter, @StringInput) - 1, -1),
    LEN(@StringInput)));
    SET @StringInput = SUBSTRING(@StringInput, ISNULL(NULLIF(CHARINDEX
    (
    @Delimiter, @StringInput
    ),
    0
    ), LEN
    (
    @StringInput)
    )
    + 1, LEN(@StringInput));

    INSERT INTO @OutputTable (id)
      VALUES (@String);
  END;

  RETURN;
END;
GO

-- this is the search script which can be optionally converted to a stored procedure /function


declare @search varchar(max) = 'infection upper acute genito'; -- enter your search string here
-- the searched string above should give rows containing the following
-- infection in upper side with acute genitointestinal tract
-- acute infection in upper teeth
-- acute genitointestinal pain

if (len(trim(@search)) = 0) -- if search string is empty, just return records ordered alphabetically
begin
 select 1 as Priority ,myTableid, code, Description from myTable order by Description 
 return;
end

declare @splitTable Table(
wordRank int Identity(1,1), -- individual words are assinged priority order (in order of occurence/position)
word varchar(200)
)
declare @nonWordTable Table( -- table to trim out auxiliary verbs, prepositions etc. from the search
id varchar(200)
)

insert into @nonWordTable values
('of'),
('with'),
('at'),
('in'),
('for'),
('on'),
('by'),
('like'),
('up'),
('off'),
('near'),
('is'),
('are'),
(','),
(':'),
(';')

insert into @splitTable
select id from dbo.fnSplit(@search,' '); -- this function gives you a table with rows containing all the space separated words of the search like in this e.g., the output will be -
--  id
-------------
-- infection
-- upper
-- acute
-- genito

delete s from @splitTable s join @nonWordTable n  on s.word = n.id; -- trimming out non-words here
declare @countOfSearchStrings int = (select count(word) from @splitTable);  -- count of space separated words for search
declare @highestPriority int = POWER(@countOfSearchStrings,3);

with plainMatches as
(
select myTableid, @highestPriority as Priority from myTable where Description like @search  -- exact matches have highest priority
union                                      
select myTableid, @highestPriority-1 as Priority from myTable where Description like  @search + '%'  -- then with something at the end
union                                      
select myTableid, @highestPriority-2 as Priority from myTable where Description like '%' + @search -- then with something at the beginning
union                                      
select myTableid, @highestPriority-3 as Priority from myTable where Description like '%' + @search + '%' -- then if the word falls somewhere in between
),
splitWordMatches as( -- give each searched word a rank based on its position in the searched string
                     -- and calculate its char index in the field to search
select myTable.myTableid, (@countOfSearchStrings - s.wordRank) as Priority, s.word,
wordIndex = CHARINDEX(s.word, myTable.Description)  from myTable join @splitTable s on myTable.Description like '%'+ s.word + '%'
-- and not exists(select myTableid from plainMatches p where p.myTableId = myTable.myTableId) -- need not look into myTables that have already been found in plainmatches as they are highest ranked
                                                                              -- this one takes a long time though, so commenting it, will have no impact on the result
),
matchingRowsWithAllWords as (
 select myTableid, count(myTableid) as myTableCount from splitWordMatches group by(myTableid) having count(myTableid) = @countOfSearchStrings
)
, -- trim off the CTE here if you don't care about the ordering of words to be considered for priority
wordIndexRatings as( -- reverse the char indexes retrived above so that words occuring earlier have higher weightage
                     -- and then normalize them to sequential values
select s.myTableid, Priority, word, ROW_NUMBER() over (partition by s.myTableid order by wordindex desc) as comparativeWordIndex 
from splitWordMatches s join matchingRowsWithAllWords m on s.myTableId = m.myTableId
)
,
wordIndexSequenceRatings as ( -- need to do this to ensure that if the same set of words from search string is found in two rows,
                              -- their sequence in the field value is taken into account for higher priority
    select w.myTableid, w.word, (w.Priority + w.comparativeWordIndex + coalesce(sequncedPriority ,0)) as Priority
    from wordIndexRatings w left join 
    (
     select w1.myTableid, w1.priority, w1.word, w1.comparativeWordIndex, count(w1.myTableid) as sequncedPriority
     from wordIndexRatings w1 join wordIndexRatings w2 on w1.myTableId = w2.myTableId and w1.Priority > w2.Priority and w1.comparativeWordIndex>w2.comparativeWordIndex
     group by w1.myTableid, w1.priority,w1.word, w1.comparativeWordIndex
    ) 
    sequencedPriority on w.myTableId = sequencedPriority.myTableId and w.Priority = sequencedPriority.Priority
),
prioritizedSplitWordMatches as ( -- this calculates the cumulative priority for a field value
select  w1.myTableId, sum(w1.Priority) as OverallPriority from wordIndexSequenceRatings w1 join wordIndexSequenceRatings w2 on w1.myTableId =  w2.myTableId 
where w1.word <> w2.word group by w1.myTableid 
),
completeSet as (
select myTableid, priority from plainMatches -- get plain matches which should be highest ranked
union
select myTableid, OverallPriority as priority from prioritizedSplitWordMatches -- get ranked split word matches (which are ordered based on word rank in search string and sequence)
),
maximizedCompleteSet as( -- set the priority of a field value = maximum priority for that field value
select myTableid, max(priority) as Priority  from completeSet group by myTableId
)
select priority, myTable.myTableid , code, Description from maximizedCompleteSet m join myTable  on m.myTableId = myTable.myTableId 
order by Priority desc, Description -- order by priority desc to get highest rated items on top
--offset 0 rows fetch next 50 rows only -- optional paging

SQL：查找列包含所有给定单词的行

提问by veljkoz

采纳答案by RedFilter

回答by Eduardo Cuomo

Function

功能

Query

询问

回答by MK.

回答by Martin Smith

回答by A-K

回答by JBelfort

相关推荐

最近更新

标签

SQL：查找列包含所有给定单词的行

提问by veljkoz

采纳答案by RedFilter

回答by Eduardo Cuomo

Function

功能

Query

询问

回答by MK.

回答by Martin Smith

回答by A-K

回答by JBelfort

相关推荐

SQL Server 2008 使用联接表中的联接和 Where 子句更新查询

SQL 添加列描述

SQL 选择除以 2 后余数（模）为 1 的行？

SQL HQL 查询以检查集合大小是否为 0 或为空

相关推荐

最近更新

标签