SQL:查找列包含所有给定单词的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3804887/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL: Find rows where Column contains all of the given words
提问by veljkoz
I have some column EntityName, and I want to have users to be able to search names by entering words separated by space. The space is implicitly considered as an 'AND' operator, meaning that the returned rows must have all of the words specified, and not necessarily in the given order.
我有一些列 EntityName,我希望用户能够通过输入由空格分隔的单词来搜索名称。空格被隐式视为“AND”运算符,这意味着返回的行必须具有指定的所有单词,而不必按给定的顺序排列。
For example, if we have rows like these:
例如,如果我们有这样的行:
- abba nina pretty balerina
- acdc you shook me all night long
- sth you are me
- dream theater it's all about you
- 阿巴尼娜漂亮的芭蕾舞女演员
- acdc 你整夜都在摇晃我
- ……你是我
- 梦想剧院,一切都与你有关
when the user enters: me you
, or you me
(the results must be equivalent), the result has rows 2 and 3.
当用户输入:me you
,或you me
(结果必须相等)时,结果有第 2 行和第 3 行。
I know I can go like:
我知道我可以这样:
WHERE Col1 LIKE '%' + word1 + '%'
AND Col1 LIKE '%' + word2 + '%'
but I wanted to know if there's some more optimal solution.
但我想知道是否有一些更优化的解决方案。
The CONTAINS
would require a full text index, which (for various reasons) is not an option.
这CONTAINS
将需要全文索引,这(出于各种原因)不是一种选择。
Maybe Sql2008 has some built-in, semi-hidden solution for these cases?
也许 Sql2008 对这些情况有一些内置的、半隐藏的解决方案?
采纳答案by RedFilter
The only thing I can think of is to write a CLR
function that does the LIKE
comparisons. This should be many times faster.
我唯一能想到的就是编写一个进行比较的CLR
函数LIKE
。这应该快很多倍。
Update:Now that I think about it, it makes sense CLR would not help. Two other ideas:
更新:现在我想想,CLR 无济于事是有道理的。另外两个想法:
1 - Try indexing Col1 and do this:
1 - 尝试索引 Col1 并执行以下操作:
WHERE (Col1 LIKE word1 + '%' or Col1 LIKE '%' + word1 + '%')
AND (Col1 LIKE word2 + '%' or Col1 LIKE '%' + word2 + '%')
Depending on the most common searches (starts with vs. substring), this may offer an improvement.
根据最常见的搜索(以与子字符串开头),这可能会提供改进。
2 - Add your own full text indexing table where each word is a row in the table. Then you can index properly.
2 - 添加您自己的全文索引表,其中每个单词都是表中的一行。然后你就可以正确索引了。
回答by Eduardo Cuomo
Function
功能
CREATE FUNCTION [dbo].[fnSplit] ( @sep CHAR(1), @str VARCHAR(512) )
RETURNS TABLE AS
RETURN (
WITH Pieces(pn, start, stop) AS (
SELECT 1, 1, CHARINDEX(@sep, @str)
UNION ALL
SELECT pn + 1, stop + 1, CHARINDEX(@sep, @str, stop + 1)
FROM Pieces
WHERE stop > 0
)
SELECT
pn AS Id,
SUBSTRING(@str, start, CASE WHEN stop > 0 THEN stop - start ELSE 512 END) AS Data
FROM
Pieces
)
Query
询问
DECLARE @FilterTable TABLE (Data VARCHAR(512))
INSERT INTO @FilterTable (Data)
SELECT DISTINCT S.Data
FROM fnSplit(' ', 'word1 word2 word3') S -- Contains words
SELECT DISTINCT
T.*
FROM
MyTable T
INNER JOIN @FilterTable F1 ON T.Col1 LIKE '%' + F1.Data + '%'
LEFT JOIN @FilterTable F2 ON T.Col1 NOT LIKE '%' + F2.Data + '%'
WHERE
F2.Data IS NULL
回答by MK.
回答by Martin Smith
You're going to end up with a full table scan anyway.
无论如何,您最终都会进行全表扫描。
The collation can make a big difference apparently. Kalen Delaney in the book "Microsoft SQL Server 2008 Internals" says:
整理显然可以产生很大的不同。Kalen Delaney 在“ Microsoft SQL Server 2008 Internals”一书中说:
Collation can make a huge difference when SQL Server has to look at almost all characters in the strings. For instance, look at the following:
SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'
This may execute 10 times faster or more with a binary collation than a nonbinary Windows collation. And with
varchar
data, this executes up to seven or eight times faster with a SQL collation than with a Windows collation.
当 SQL Server 必须查看字符串中的几乎所有字符时,排序规则会产生巨大的差异。例如,请看以下内容:
SELECT COUNT(*) FROM tbl WHERE longcol LIKE '%abc%'
与非二进制 Windows 排序规则相比,二进制排序规则的执行速度可能快 10 倍或更多。对于
varchar
数据,使用 SQL 归类的执行速度比使用 Windows 归类快 7 或 8 倍。
回答by A-K
WITH Tokens AS(SELECT 'you' AS Token UNION ALL SELECT 'me')
SELECT ...
FROM YourTable AS t
WHERE (SELECT COUNT(*) FROM Tokens WHERE y.Col1 LIKE '%'+Tokens.Token+'%')
=
(SELECT COUNT(*) FROM Tokens) ;
回答by JBelfort
This should ideally be done with the help of Full text search as mentioned above. BUT, If you don't have full text configured for your DB, here is a performance intensive solution for doing a prioritized string search.
理想情况下,这应该在如上所述的全文搜索的帮助下完成。但是,如果你没有为你的数据库配置全文,这里是一个执行优先字符串搜索的性能密集型解决方案。
-- table to search in
drop table if exists dbo.myTable;
go
CREATE TABLE dbo.myTable
(
myTableId int NOT NULL IDENTITY (1, 1),
code varchar(200) NOT NULL,
description varchar(200) NOT NULL -- this column contains the values we are going to search in
) ON [PRIMARY]
GO
-- function to split space separated search string into individual words
drop function if exists [dbo].[fnSplit];
go
CREATE FUNCTION [dbo].[fnSplit] (@StringInput nvarchar(max),
@Delimiter nvarchar(1))
RETURNS @OutputTable TABLE (
id nvarchar(1000)
)
AS
BEGIN
DECLARE @String nvarchar(100);
WHILE LEN(@StringInput) > 0
BEGIN
SET @String = LEFT(@StringInput, ISNULL(NULLIF(CHARINDEX(@Delimiter, @StringInput) - 1, -1),
LEN(@StringInput)));
SET @StringInput = SUBSTRING(@StringInput, ISNULL(NULLIF(CHARINDEX
(
@Delimiter, @StringInput
),
0
), LEN
(
@StringInput)
)
+ 1, LEN(@StringInput));
INSERT INTO @OutputTable (id)
VALUES (@String);
END;
RETURN;
END;
GO
-- this is the search script which can be optionally converted to a stored procedure /function
declare @search varchar(max) = 'infection upper acute genito'; -- enter your search string here
-- the searched string above should give rows containing the following
-- infection in upper side with acute genitointestinal tract
-- acute infection in upper teeth
-- acute genitointestinal pain
if (len(trim(@search)) = 0) -- if search string is empty, just return records ordered alphabetically
begin
select 1 as Priority ,myTableid, code, Description from myTable order by Description
return;
end
declare @splitTable Table(
wordRank int Identity(1,1), -- individual words are assinged priority order (in order of occurence/position)
word varchar(200)
)
declare @nonWordTable Table( -- table to trim out auxiliary verbs, prepositions etc. from the search
id varchar(200)
)
insert into @nonWordTable values
('of'),
('with'),
('at'),
('in'),
('for'),
('on'),
('by'),
('like'),
('up'),
('off'),
('near'),
('is'),
('are'),
(','),
(':'),
(';')
insert into @splitTable
select id from dbo.fnSplit(@search,' '); -- this function gives you a table with rows containing all the space separated words of the search like in this e.g., the output will be -
-- id
-------------
-- infection
-- upper
-- acute
-- genito
delete s from @splitTable s join @nonWordTable n on s.word = n.id; -- trimming out non-words here
declare @countOfSearchStrings int = (select count(word) from @splitTable); -- count of space separated words for search
declare @highestPriority int = POWER(@countOfSearchStrings,3);
with plainMatches as
(
select myTableid, @highestPriority as Priority from myTable where Description like @search -- exact matches have highest priority
union
select myTableid, @highestPriority-1 as Priority from myTable where Description like @search + '%' -- then with something at the end
union
select myTableid, @highestPriority-2 as Priority from myTable where Description like '%' + @search -- then with something at the beginning
union
select myTableid, @highestPriority-3 as Priority from myTable where Description like '%' + @search + '%' -- then if the word falls somewhere in between
),
splitWordMatches as( -- give each searched word a rank based on its position in the searched string
-- and calculate its char index in the field to search
select myTable.myTableid, (@countOfSearchStrings - s.wordRank) as Priority, s.word,
wordIndex = CHARINDEX(s.word, myTable.Description) from myTable join @splitTable s on myTable.Description like '%'+ s.word + '%'
-- and not exists(select myTableid from plainMatches p where p.myTableId = myTable.myTableId) -- need not look into myTables that have already been found in plainmatches as they are highest ranked
-- this one takes a long time though, so commenting it, will have no impact on the result
),
matchingRowsWithAllWords as (
select myTableid, count(myTableid) as myTableCount from splitWordMatches group by(myTableid) having count(myTableid) = @countOfSearchStrings
)
, -- trim off the CTE here if you don't care about the ordering of words to be considered for priority
wordIndexRatings as( -- reverse the char indexes retrived above so that words occuring earlier have higher weightage
-- and then normalize them to sequential values
select s.myTableid, Priority, word, ROW_NUMBER() over (partition by s.myTableid order by wordindex desc) as comparativeWordIndex
from splitWordMatches s join matchingRowsWithAllWords m on s.myTableId = m.myTableId
)
,
wordIndexSequenceRatings as ( -- need to do this to ensure that if the same set of words from search string is found in two rows,
-- their sequence in the field value is taken into account for higher priority
select w.myTableid, w.word, (w.Priority + w.comparativeWordIndex + coalesce(sequncedPriority ,0)) as Priority
from wordIndexRatings w left join
(
select w1.myTableid, w1.priority, w1.word, w1.comparativeWordIndex, count(w1.myTableid) as sequncedPriority
from wordIndexRatings w1 join wordIndexRatings w2 on w1.myTableId = w2.myTableId and w1.Priority > w2.Priority and w1.comparativeWordIndex>w2.comparativeWordIndex
group by w1.myTableid, w1.priority,w1.word, w1.comparativeWordIndex
)
sequencedPriority on w.myTableId = sequencedPriority.myTableId and w.Priority = sequencedPriority.Priority
),
prioritizedSplitWordMatches as ( -- this calculates the cumulative priority for a field value
select w1.myTableId, sum(w1.Priority) as OverallPriority from wordIndexSequenceRatings w1 join wordIndexSequenceRatings w2 on w1.myTableId = w2.myTableId
where w1.word <> w2.word group by w1.myTableid
),
completeSet as (
select myTableid, priority from plainMatches -- get plain matches which should be highest ranked
union
select myTableid, OverallPriority as priority from prioritizedSplitWordMatches -- get ranked split word matches (which are ordered based on word rank in search string and sequence)
),
maximizedCompleteSet as( -- set the priority of a field value = maximum priority for that field value
select myTableid, max(priority) as Priority from completeSet group by myTableId
)
select priority, myTable.myTableid , code, Description from maximizedCompleteSet m join myTable on m.myTableId = myTable.myTableId
order by Priority desc, Description -- order by priority desc to get highest rated items on top
--offset 0 rows fetch next 50 rows only -- optional paging