在 WHERE 子句中使用 REPLACE 检查拼写排列 - MS SQL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6252882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 10:51:33  来源:igfitidea点击:

using REPLACE in WHERE clause to check spelling permutations - MS SQL

sqlsql-serverreplacewhere-clause

提问by Steph Rose

I have a table like:

我有一张像:

| id | lastname | firstname |
|  1 | doe      | john      |
|  2 | oman     | donald    |
|  3 | o'neill  | james     |
|  4 | onackers | sharon    |

Essentially, users are going to be searching by the first letters of the last name.

本质上,用户将通过姓氏的第一个字母进行搜索。

I want to be able to return results that contain and don't contain punctuation from the database. For instance, when a user searches for: on

我希望能够从数据库中返回包含和不包含标点符号的结果。例如,当用户搜索:

I want to return both: o'neill, onackers

我想返回两个: 奥尼尔,奥纳克斯

I want someone to be able to search "o, on, oneill, o neill, etc" to get o'neill.

我希望有人能够搜索“o、on、oneill、o neill 等”来获取 o'neill。

So the best way to do this seems to take the lastname column value and have two permutations of it searched in the WHERE clause with an OR. One where any special characters are replaced with the _ in SQL, and one where all non-alpha chars (including spaces) are gone.

因此,最好的方法似乎是采用 lastname 列值,并在 WHERE 子句中使用 OR 搜索它的两个排列。一种是用 SQL 中的 _ 替换任何特殊字符,另一种是所有非字母字符(包括空格)都消失了。

I figure I can use the underscore in the SQL replace to keep the one space available.

我想我可以在 SQL 替换中使用下划线来保持一个空间可用。

I'm having a little trouble with the WHERE clause. I'd prefer to do this with a simple REPLACE rather than creating a regex function if possible. If that's a no-go though, I understand:

我在使用 WHERE 子句时遇到了一些麻烦。如果可能的话,我更喜欢用简单的 REPLACE 来做到这一点,而不是创建一个正则表达式函数。如果那是不行的,我明白:

@last_name (this is the nvarchar input)

SELECT id, lastname, firstname
FROM people
WHERE ((REPLACE(people.lastname, '[^A-Za-z]', '_') like @last_name + '%')
OR (REPLACE(people.lastnname,'[^A-Za-z ]', '') like @last_name + '%'))
ORDER BY lastname

I'm pretty sure the replace part has to be on the other side of the LIKE. I'm messing up the structure but need some help.

我很确定替换部分必须在 LIKE 的另一侧。我搞砸了结构,但需要一些帮助。

I am Using MSSQL Server 2005.

我正在使用 MSSQL Server 2005。

Thank you so much in advance.

非常感谢你。

UPDATE

更新

It seems like I have two options:

好像我有两个选择:

  1. Create a regular expression function using CLR (excuse me if I'm saying this wrong, I'm new to it)
  2. Create extra columns on the table or create a new "fuzzyTable" with the cleaned up last names.
  1. 使用 CLR 创建一个正则表达式函数(如果我说错了,请原谅我,我是新手)
  2. 在表上创建额外的列或使用清理过的姓氏创建一个新的“fuzzyTable”。

The database gets updated once a night. I have actually already begun the new table approach, as it was what I was originally going to do. However, I'm beginning to think it's smarter to add the "fuzzy" columns to the main table and then on the nightly update to add the adjusted lastnames to the new / updated rows.

数据库每晚更新一次。我实际上已经开始了新的表格方法,因为这是我最初要做的。但是,我开始认为将“模糊”列添加到主表然后在每晚更新时将调整后的姓氏添加到新的/更新的行更明智。

Stack Overflow: Which approach is better? User-defined REGEX function I can use in the SQL, and thus avoid extra columns? Or adding the extra column or two to the table? Or a new table?

堆栈溢出:哪种方法更好?我可以在 SQL 中使用用户定义的 REGEX 函数,从而避免额外的列?或者在表格中添加一两列额外的列?还是新表?

采纳答案by takrl

Depending on how complex your scenario can get, this will be lots of work, and slow too. But there's a more flexible approach. Consider something like this, referred to as initialTable:

根据您的场景的复杂程度,这将需要大量工作,而且速度也很慢。但是有一种更灵活的方法。考虑这样的事情,称为initialTable

| id | lastname | firstname |
|  1 | o'malley | josé      |
|  2 | omállèy  | dònáld    |
|  3 | o'neill  | jámès     |
|  4 | onackers | sharon    |

Maybe a bit much, but it illustrates the general problem. I had to implement a "fuzzy" search for our intranet website based on character data that looked very similar - there's many accents in french or spanish names or street addresses for example.

也许有点多,但它说明了一般问题。我不得不根据看起来非常相似的字符数据对我们的 Intranet 网站实施“模糊”搜索 - 例如,法语或西班牙语名称或街道地址中有许多重音。

What I did was define a function that performed all replacements for a given string, for example (pseudocode):

我所做的是定义一个函数来执行给定字符串的所有替换,例如(伪代码):

function string replacestuff(string input)
{
  input = replace(input, "è", "e");
  input = replace(input, "é", "e");
  input = replace(input, "ò", "o");
  input = replace(input, "ó", "o");
  input = replace(input, "'", "");
  ...
  return input;
}

Using this conversion function, create a second table fuzzyTablethat has the following content:

使用此转换函数,创建fuzzyTable具有以下内容的第二个表:

| id | lastname | firstname |
|  1 | omalley  | jose      |
|  2 | omalley  | donald    |
|  3 | oneill   | james     |
|  4 | onackers | sharon    |

Now, assume you'll get an input string for your search of josè. This can't be found in either table. What you'll have to do is this:

现在,假设您将获得一个用于搜索josè. 这在任何一个表中都找不到。你需要做的是:

declare @input varchar(50)
declare @input_mod varchar(50)
set @input = 'josè'
set @input_mod = replacestuff(@input)

SELECT id FROM initialTable WHERE firstname like @input OR firstname like @input_mod
UNION
SELECT id FROM fuzzyTable WHERE firstname like @input OR firstname like @input_mod
GROUP BY id

(Of course, you'd have to add %to make LIKEwork.) The key here is to modify your input search string using the replacement function; this way you'll get a match if searching for against a content of because both come down to sewhen being processed by the replacement function.

(当然,您必须添加%才能LIKE起作用。)这里的关键是使用替换功能修改您输入的搜索字符串;这样,如果针对 的内容进行搜索,您将获得匹配项,因为两者都归结se为由替换函数处理时。

You could even do a two-level search; first check only the unmodified string against the proper table and then with the statement shown above do a fuzzy search if the user says so.

您甚至可以进行两级搜索;首先根据正确的表只检查未修改的字符串,然后如果用户这样说,则使用上面显示的语句进行模糊搜索。

This is a very flexible approach and can handle all sorts of stuff, like finding german letters ?, ?, ü, ? by using two-letter expressions ae, oe, ue, ss. The disadvantage is that you'll have to keep duplicates of some data, and change those duplicates within fuzzyTable as the initialTable (or the replacement function) changes. In our current use case, the intranet database gets updated once a night, so it's not a problem.

这是一种非常灵活的方法,可以处理各种内容,例如查找德语字母 ?, ?, ü, ? 通过使用两个字母的表达式 ae、oe、ue、ss。缺点是您必须保留某些数据的重复项,并随着initialTable(或替换函数)的更改在模糊表中更改这些重复项。在我们当前的用例中,内网数据库每晚更新一次,所以这不是问题。

EDIT

编辑

You need to be aware that, using this, in some cases you'll get false positives. For example, we're using this for an employee search, and if you've got a dutch name spelled Hoek, you'd also find this name searching for H?k, because in german the replacement for ?would be oe. This could be solved using country-aware replacement functions, but we never took the concept this far. Depending on your input data this is more or less academic, for our use case I can't remember anyone complaining.

您需要注意,使用它,在某些情况下您会得到误报。例如,我们使用这一个人员搜索,如果你已经有了一个荷兰名字拼写Hoek,你也会觉得这个名字搜索H?k,因为在德国的替代?将是oe。这可以使用国家感知替换功能来解决,但我们从来没有把这个概念带到这么远。根据您的输入数据,这或多或少是学术性的,对于我们的用例,我不记得有人抱怨过。

The main reason why we came up with this approach in the first place was that some of the data we had to work with was riddled with spelling errors, ie. in french many vowels were accented the wrong way around, but still we needed to deliver a result.

我们首先提出这种方法的主要原因是我们必须处理的一些数据充满拼写错误,即。在法语中,许多元音的重音方式错误,但我们仍然需要提供结果。

回答by MatBailie

I believe the trouble you're having is that SQL-Server's repalce function doesn't accept [^A-Za-z]to mean "non-alpa characters". Instead it's actually looking for that exact string to replace it.

我相信您遇到的麻烦是 SQL-Server 的 repalce 函数不接受[^A-Za-z]表示“非 alpa 字符”。相反,它实际上是在寻找那个确切的字符串来替换它。

http://msdn.microsoft.com/en-us/library/ms186862%28v=sql.90%29.aspx

http://msdn.microsoft.com/en-us/library/ms186862%28v=sql.90%29.aspx



In terms of using Regular Expressions, I've only done that by using the CLR, which seems to be getting much too involved for this particular problem.

就使用正则表达式而言,我只是通过使用 CLR 来实现的,这对于这个特定问题似乎过于复杂。



My advice would be to hold the searchable fields, in the two different formats, in the table itself. And then use a simple LIKE search.

我的建议是将可搜索字段以两种不同的格式保存在表格本身中。然后使用简单的 LIKE 搜索。

WHERE last_name LIKE @last_name OR last_name_stripped LIKE @last_name

WHERE last_name LIKE @last_name OR last_name_stripped LIKE @last_name

last_name_stripped could then be a computed column (maybe using a function to strip all non_alpha characters), or handled by your client at Insert time.

last_name_stripped 然后可以是一个计算列(可能使用一个函数来去除所有 non_alpha 字符),或者在插入时由您的客户端处理。

回答by ypercube??

Using:

使用:

WHERE ( REPLACE(people.lastname, '[^A-Za-z]', '') LIKE @last_name + '%' )

or

或者

WHERE ( ComplexFunction( field ) LIKE whatever )

will most likely have the result that your query will not use the index (if there is one) of field people.lastnameand thus scan the whole table every time you run the query.

很可能会导致您的查询不会使用字段的索引(如果有),people.lastname因此每次运行查询时都会扫描整个表。

I see two ways to avoid this:

我看到了两种避免这种情况的方法:

One, add another field lastnameStrippedto the table, where the ComplexFunction(lastname)is stored and an index to this field. Then you can search with either:

一,lastnameStripped向表中添加另一个字段,其中ComplexFunction(lastname)存储了该字段的索引。然后您可以使用以下任一方式进行搜索:

WHERE ( lastnameStripped LIKE REPLACE(@last_name, '[^A-Za-z]', '') + '%' )

or

或者

WHERE ( lastnameStripped LIKE @last_name + '%' )

and both will use the index of lastnameStripped.

并且两者都将使用 的索引lastnameStripped

Two, create an indexed viewwith the ComplexFunction( lastname )as a field.

二,创建一个索引视图ComplexFunction( lastname )作为一个字段。

回答by Philip Kelley

If you need to do relatively complex lookups on a column on a large table, it could be more efficient to create a second column that contains the data formatted for efficient searches (with the immediate caveat that "like" searches are rarely efficient). So where you have column LastName, add a new column like LastNameLookup, and populate that column with the data formatted appropriately for your search criteria. If the formatting rules are relatively simple, you could implement this as a computed column column; if performance is important, make it a persisted computed column.

如果您需要对大表中的一列进行相对复杂的查找,创建包含为高效搜索格式化的数据的第二列可能会更有效(立即警告,“like”搜索很少有效)。因此,在您有 column 的地方LastName,添加一个新列,如LastNameLookup,并使用根据您的搜索条件适当格式化的数据填充该列。如果格式规则比较简单,你可以将其实现为计算列列;如果性能很重要,请将其设为持久计算列。

Also to mention, SQL does not support regular expressions (though there is a limited form tied in to the LIKE clause in SQL 2008).

还要提一下,SQL 不支持正则表达式(尽管 SQL 2008 中的 LIKE 子句有一个有限的形式)。

回答by Patriotec

Hmm...using classic asp example. I'm guessing this is from a form. For this example I'm calling your textbox field 'namesearch'. So the page where you request.form("namesearch"), just assign strSearch = request.form("namesearch"). Then before you run it into the SQL query do something like this:

嗯......使用经典的asp示例。我猜这是来自一个表格。对于本示例,我将您的文本框字段称为“namesearch”。所以你 request.form("namesearch") 所在的页面,只需分配 strSearch = request.form("namesearch")。然后在将它运行到 SQL 查询之前执行以下操作:

strSearch = request.form("namesearch") 'to get textbox info from form

strSearch = replace(strSearch," ", "") 'to remove spaces
strSearch = replace(strSearch,"'", "") 'to remove apostrophes

For the SQL

对于 SQL

SELECT id, lastname, firstname FROM people WHERE people.lastname like '%"& strSearch &"%' ORDER BY lastname

Tested and works using VBScript and SQL 2005 Server

使用 VBScript 和 SQL 2005 Server 进行测试和工作

回答by M.suleman Khan

In my case i had a table in which i had phone numbers with dashes and i wanted to search a record with a phone number user enter (but user enter phone number without dashes)

在我的情况下,我有一个表格,其中有带破折号的电话号码,我想用用户输入的电话号码搜索记录(但用户输入不带破折号的电话号码)

Phone number with dashes

带破折号的电话号码

so i did something like this

所以我做了这样的事情

select * from rpcusttest
WHERE ( REPLACE(RPCustTest.CustomerID, '-', '') LIKE '7183877333' + '%' )

Now though user is enter a number without dahses it will search all records with dashes as well

现在虽然用户输入一个没有破折号的数字,它也会搜索所有带破折号的记录