如何替换 SQL 中的多个字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1580017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to Replace Multiple Characters in SQL?
提问by kiev
This is based on a similar question How to Replace Multiple Characters in Access SQL?
这是基于类似的问题如何替换 Access SQL 中的多个字符?
I wrote this since sql server 2005 seems to have a limit on replace() function to 19 replacements inside a where clause.
我写这个是因为 sql server 2005 似乎将 replace() 函数限制为 where 子句中的 19 个替换。
I have the following task: Need to perform a match on a column, and to improve the chances of a match stripping multiple un-needed chars using replace() function
我有以下任务:需要对列执行匹配,并使用 replace() 函数提高匹配去除多个不需要的字符的机会
DECLARE @es NVarChar(1) SET @es = ''
DECLARE @p0 NVarChar(1) SET @p0 = '!'
DECLARE @p1 NVarChar(1) SET @p1 = '@'
---etc...
SELECT *
FROM t1,t2
WHERE REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es)
= REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es)
---etc
If there are >19 REPLACE() in that where clause, it doesn't work. So the solution I came up with is to create a sql function called trimCharsin this example (excuse them starting at @22
如果该 where 子句中有 >19 个 REPLACE(),则它不起作用。所以我想出的解决方案是在这个例子中创建一个名为trimChars的 sql 函数(请原谅他们从@22 开始
CREATE FUNCTION [trimChars] (
@string varchar(max)
)
RETURNS varchar(max)
AS
BEGIN
DECLARE @es NVarChar(1) SET @es = ''
DECLARE @p22 NVarChar(1) SET @p22 = '^'
DECLARE @p23 NVarChar(1) SET @p23 = '&'
DECLARE @p24 NVarChar(1) SET @p24 = '*'
DECLARE @p25 NVarChar(1) SET @p25 = '('
DECLARE @p26 NVarChar(1) SET @p26 = '_'
DECLARE @p27 NVarChar(1) SET @p27 = ')'
DECLARE @p28 NVarChar(1) SET @p28 = '`'
DECLARE @p29 NVarChar(1) SET @p29 = '~'
DECLARE @p30 NVarChar(1) SET @p30 = '{'
DECLARE @p31 NVarChar(1) SET @p31 = '}'
DECLARE @p32 NVarChar(1) SET @p32 = ' '
DECLARE @p33 NVarChar(1) SET @p33 = '['
DECLARE @p34 NVarChar(1) SET @p34 = '?'
DECLARE @p35 NVarChar(1) SET @p35 = ']'
DECLARE @p36 NVarChar(1) SET @p36 = '\'
DECLARE @p37 NVarChar(1) SET @p37 = '|'
DECLARE @p38 NVarChar(1) SET @p38 = '<'
DECLARE @p39 NVarChar(1) SET @p39 = '>'
DECLARE @p40 NVarChar(1) SET @p40 = '@'
DECLARE @p41 NVarChar(1) SET @p41 = '-'
return REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
@string, @p22, @es), @p23, @es), @p24, @es), @p25, @es), @p26, @es), @p27, @es), @p28, @es), @p29, @es), @p30, @es), @p31, @es), @p32, @es), @p33, @es), @p34, @es), @p35, @es), @p36, @es), @p37, @es), @p38, @es), @p39, @es), @p40, @es), @p41, @es)
END
This can then be used in addition to the other replace strings
然后可以将其与其他替换字符串一起使用
SELECT *
FROM t1,t2
WHERE trimChars(REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es)
= REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es))
I created a few more functions to do similar replacing like so trimChars(trimMoreChars(
我创建了更多的函数来做类似的替换,比如trimChars(trimMoreChars(
SELECT *
FROM t1,t2
WHERE trimChars(trimMoreChars(REPLACE(REPLACE(t1.stringkey,@p0, @es), @p1, @es)
= REPLACE(REPLACE(t2.stringkey,@p0, @es), @p1, @es)))
Can someone give me a better solution to this problem in terms of performance and maybe a cleaner implementation?
有人可以在性能方面给我一个更好的解决方案吗?也许是一个更清晰的实现?
采纳答案by Cade Roux
I would seriously consider making a CLR UDFinsteadand using regular expressions (both the string and the pattern can be passed in as parameters) to do a complete search and replace for a range of characters. It should easily outperform this SQL UDF.
我会认真考虑改用CLR UDF并使用正则表达式(字符串和模式都可以作为参数传入)来执行完整的搜索并替换一系列字符。它应该很容易胜过这个 SQL UDF。
回答by Juliet
One useful trick in SQL is the ability use @var = function(...)
to assign a value. If you have multiple records in your record set, your var is assigned multiple times with side-effects:
SQL 中的一个有用技巧是@var = function(...)
分配值的能力。如果您的记录集中有多个记录,则您的 var 会被多次分配并产生副作用:
declare @badStrings table (item varchar(50))
INSERT INTO @badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '@'
declare @testString varchar(100), @newString varchar(100)
set @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
set @newString = @testString
SELECT @newString = Replace(@newString, item, '') FROM @badStrings
select @newString -- returns 'Juliet ro0zs my s0xrzone'
回答by Duanne
I really like @Juliett's solution! I would just use a CTE to get all the invalid characters:
我真的很喜欢@Juliett 的解决方案!我只想使用 CTE 来获取所有无效字符:
DECLARE @badStrings VARCHAR(100)
DECLARE @teststring VARCHAR(100)
SET @badStrings = '><()!?@'
SET @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
;WITH CTE AS
(
SELECT SUBSTRING(@badStrings, 1, 1) AS [String], 1 AS [Start], 1 AS [Counter]
UNION ALL
SELECT SUBSTRING(@badStrings, [Start] + 1, 1) AS [String], [Start] + 1, [Counter] + 1
FROM CTE
WHERE [Counter] < LEN(@badStrings)
)
SELECT @teststring = REPLACE(@teststring, CTE.[String], '') FROM CTE
SELECT @teststring
Juliet ro0zs my s0xrzone
朱丽叶 ro0zs 我的 s0xrzone
回答by Guillermo Gutiérrez
I suggest you to create a scalar user defined function. This is an example (sorry in advance, because the variable names are in spanish):
我建议您创建一个标量用户定义函数。这是一个例子(提前抱歉,因为变量名是西班牙语):
CREATE FUNCTION [dbo].[Udf_ReplaceChars] (
@cadena VARCHAR(500), -- String to manipulate
@caracteresElim VARCHAR(100), -- String of characters to be replaced
@caracteresReem VARCHAR(100) -- String of characters for replacement
)
RETURNS VARCHAR(500)
AS
BEGIN
DECLARE @cadenaFinal VARCHAR(500), @longCad INT, @pos INT, @caracter CHAR(1), @posCarER INT;
SELECT
@cadenaFinal = '',
@longCad = LEN(@cadena),
@pos = 1;
IF LEN(@caracteresElim)<>LEN(@caracteresReem)
BEGIN
RETURN NULL;
END
WHILE @pos <= @longCad
BEGIN
SELECT
@caracter = SUBSTRING(@cadena,@pos,1),
@pos = @pos + 1,
@posCarER = CHARINDEX(@caracter,@caracteresElim);
IF @posCarER <= 0
BEGIN
SET @cadenaFinal = @cadenaFinal + @caracter;
END
ELSE
BEGIN
SET @cadenaFinal = @cadenaFinal + SUBSTRING(@caracteresReem,@posCarER,1)
END
END
RETURN @cadenaFinal;
END
Here is an example using this function:
下面是一个使用这个函数的例子:
SELECT dbo.Udf_ReplaceChars('This is a test.','sat','Z47');
And the result is: 7hiZ iZ 4 7eZ7.
结果是:7hiZ iZ 4 7eZ7。
As you can see, each character of the @caracteresElim
parameter is replaced by the character in the same position from the @caracteresReem
parameter.
如您所见,@caracteresElim
参数的每个字符都被与参数相同位置的字符替换@caracteresReem
。
回答by Ethan1701
While this question was asked about SQL Server 2005, it's worth noting that as of Sql Server 2017, the request can be done with the new TRANSLATE function.
虽然这个问题是关于 SQL Server 2005 的,但值得注意的是,从 Sql Server 2017 开始,可以使用新的 TRANSLATE 函数完成请求。
https://docs.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql
https://docs.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql
I hope this information helps people who get to this page in the future.
我希望这些信息对将来访问此页面的人有所帮助。
回答by Chloe Williams
I had a one-off data migration issue where the source data could not output correctly some unusual/technical characters plus the ubiquitous extra commas in CSVs.
我遇到了一次性数据迁移问题,其中源数据无法正确输出一些异常/技术字符以及 CSV 中无处不在的额外逗号。
We decided that for each such character the source extract should replace them with something that was recognisable to both the source system and the SQL Server that was loading them but which would not be in the data otherwise.
我们决定,对于每个这样的字符,源提取应该用源系统和加载它们的 SQL Server 都能识别但不会在数据中的东西替换它们。
It did mean however that in various columns across various tables these replacement characters would appear and I would have to replace them. Nesting multiple REPLACE functions made the import code look scary and prone to errors in misjudging the placement and number of brackets so I wrote the following function. I know it can process a column in a table of 3,000 rows in less than a second though I'm not sure how quickly it will scale up to multi-million row tables.
然而,这确实意味着在不同表格的不同列中会出现这些替换字符,我将不得不替换它们。嵌套多个 REPLACE 函数使得导入代码看起来很可怕,并且容易在错误判断括号的位置和数量时出错,因此我编写了以下函数。我知道它可以在不到一秒钟的时间内处理 3,000 行表中的一列,但我不确定它可以多快扩展到数百万行表。
create function [dbo].[udf_ReplaceMultipleChars]
(
@OriginalString nvarchar(4000)
, @ReplaceTheseChars nvarchar(100)
, @LengthOfReplacement int = 1
)
returns nvarchar(4000)
begin
declare @RevisedString nvarchar(4000) = N'';
declare @lengthofinput int =
(
select len(@OriginalString)
);
with AllNumbers
as (select 1 as Number
union all
select Number + 1
from AllNumbers
where Number < @lengthofinput)
select @RevisedString += case
when (charindex(substring(@OriginalString, Number, 1), @ReplaceTheseChars, 1) - 1) % 2
= 0 then
substring(
@ReplaceTheseChars
, charindex(
substring(@OriginalString, Number, 1)
, @ReplaceTheseChars
, 1
) + 1
, @LengthOfReplacement
)
else
substring(@OriginalString, Number, 1)
end
from AllNumbers
option (maxrecursion 4000);
return (@RevisedString);
end;
It works by submitting both the string to be evaluated and have characters to be replaced (@OriginalString) along with a string of paired characters where the first character is to be replaced by the second, the third by the fourth, fifth by sixth and so on (@ReplaceTheseChars).
它的工作原理是提交要评估的字符串和要替换的字符(@OriginalString)以及成对字符的字符串,其中第一个字符将被第二个替换,第三个被第四个替换,第五个被第六个替换,依此类推在 (@ReplaceThisChars) 上。
Here is the string of chars that I needed to replace and their replacements... [']"~,{?}°$±|?|?a???^?#?
这是我需要替换的字符串及其替换... [']"~,{?}°$±|?|?a???^?#?
i.e. A opening square bracket denotes an apostrophe, a closing one a double quote. You can see that there were vulgar fractions as well as degrees and diameter symbols in there.
即一个左方括号表示一个撇号,一个结束一个双引号。你可以看到那里有粗俗的分数以及度数和直径符号。
There is a default @LengthOfReplacement that is included as a starting point if anyone needed to replace longer strings. I played around with that in my project but the single char replacement was the main function.
如果有人需要替换更长的字符串,则有一个默认的 @LengthOfReplacement 作为起点包含在内。我在我的项目中使用了它,但单字符替换是主要功能。
The condition of the case statement is important. It ensures that it only replaces the character if it is found in your @ReplaceTheseChars variable and that the character has to be found in an odd numbered position (the minus 1 from charindex result ensures that anything NOT found returns a negative modulo value). i.e if you find a tilde (~) in position 5 it will replace it with a comma but if on a subsequent run it found the comma in position 6 it would not replace it with a curly bracket ({).
case 语句的条件很重要。它确保它仅替换在 @ReplaceThisChars 变量中找到的字符,并且必须在奇数位置找到该字符(charindex 结果的负 1 确保未找到的任何内容返回负模值)。即,如果您在位置 5 中找到波浪号 (~),它将用逗号替换它,但如果在后续运行中它在位置 6 中找到逗号,则不会用大括号 ({) 替换它。
This can be best demonstrated with an example...
这可以用一个例子来最好地证明......
declare @ProductDescription nvarchar(20) = N'abc~def[|][123';
select @ProductDescription
= dbo.udf_ReplaceMultipleChars(
@ProductDescription
/* NB the doubling up of the apostrophe is necessary in the string but resolves to a single apostrophe when passed to the function */
,'['']"~,{?}°$±|?|?a???^?#?'
, default
);
select @ProductDescription
, dbo.udf_ReplaceMultipleChars(
@ProductDescription
,'['']"~,{?}°$±|?|?a???^?#?'
/* if you didn't know how to type those peculiar chars in then you can build a string like this... '[' + nchar(0x0027) + ']"~,{' + nchar(0x00D8) + '}' + nchar(0x00B0) etc */
,
default
);
This will return both the value after the first pass through the function and the second time as follows... abc,def'?"'123 abc,def'?"'123
这将在第一次通过函数和第二次后返回值,如下所示... abc,def'?"'123 abc,def'?"'123
A table update would just be
表更新只是
update a
set a.Col1 = udf.ReplaceMultipleChars(a.Col1,'~,]"',1)
from TestTable a
Finally (I hear you say!), although I've not had access to the translate function I believe that this function can process the example shown in the documentation quite easily. The TRANSLATE function demo is
最后(我听到你说!),虽然我没有使用 translate 函数,但我相信这个函数可以很容易地处理文档中显示的示例。TRANSLATE 函数演示是
SELECT TRANSLATE('2*[3+4]/{7-2}', '[]{}', '()()');
which returns 2*(3+4)/(7-2) although I understand it might not work on 2*[3+4]/[7-2] !!
它返回 2*(3+4)/(7-2) 虽然我知道它可能不适用于 2*[3+4]/[7-2] !!
My function would approach this as follows listing each char to be replaced followed by its replacement [ --> (, { --> ( etc.
我的函数将按如下方式处理此问题,列出要替换的每个字符,然后是其替换 [ --> (, { --> ( 等。
select dbo.udf_ReplaceMultipleChars('2*[3+4]/{7-2}', '[({(])})', 1);
which will also work for
这也适用于
select dbo.udf_ReplaceMultipleChars('2*[3+4]/[7-2]', '[({(])})', 1);
I hope someone finds this useful and if you get to test its performance against larger tables do let us know one way or another!
我希望有人觉得这很有用,如果您要针对更大的表测试其性能,请务必以某种方式告诉我们!
回答by Adil
declare @testVal varchar(20)
set @testVal = '?t/es?ti/n*g 1*2?3*'
select @testVal = REPLACE(@testVal, item, '') from (select '?' item union select '*' union select '/') list
select @testVal;
回答by HimalayanNinja
Here are the steps
以下是步骤
- Create a CLR function
- 创建 CLR 函数
See following code:
请参阅以下代码:
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString Replace2(SqlString inputtext, SqlString filter,SqlString replacewith)
{
string str = inputtext.ToString();
try
{
string pattern = (string)filter;
string replacement = (string)replacewith;
Regex rgx = new Regex(pattern);
string result = rgx.Replace(str, replacement);
return (SqlString)result;
}
catch (Exception s)
{
return (SqlString)s.Message;
}
}
}
Deploy your CLR function
Now Test it
部署您的 CLR 功能
现在测试一下
See following code:
请参阅以下代码:
create table dbo.test(dummydata varchar(255))
Go
INSERT INTO dbo.test values('P@ssw1rd'),('This 12is @test')
Go
Update dbo.test
set dummydata=dbo.Replace2(dummydata,'[0-9@]','')
select * from dbo.test
dummydata, Psswrd, This is test booom!!!!!!!!!!!!!
回答by ahains
One option is to use a numbers/tally table to drive an iterative process via a pseudo-set based query.
一种选择是使用数字/计数表通过基于伪集的查询来驱动迭代过程。
The general idea of char replacement can be demonstrated with a simple character map table approach:
字符替换的一般思想可以用一个简单的字符映射表方法来演示:
create table charMap (srcChar char(1), replaceChar char(1))
insert charMap values ('a', 'z')
insert charMap values ('b', 'y')
create table testChar(srcChar char(1))
insert testChar values ('1')
insert testChar values ('a')
insert testChar values ('2')
insert testChar values ('b')
select
coalesce(charMap.replaceChar, testChar.srcChar) as charData
from testChar left join charMap on testChar.srcChar = charMap.srcChar
Then you can bring in the tally table approach to do the lookup on each character position in the string.
然后,您可以引入计数表方法对字符串中的每个字符位置进行查找。
create table tally (i int)
declare @i int
set @i = 1
while @i <= 256 begin
insert tally values (@i)
set @i = @i + 1
end
create table testData (testString char(10))
insert testData values ('123a456')
insert testData values ('123ab456')
insert testData values ('123b456')
select
i,
SUBSTRING(testString, i, 1) as srcChar,
coalesce(charMap.replaceChar, SUBSTRING(testString, i, 1)) as charData
from testData cross join tally
left join charMap on SUBSTRING(testString, i, 1) = charMap.srcChar
where i <= LEN(testString)
回答by Cade Roux
I don't know why Charles Bretana deleted his answer, so I'm adding it back in as a CW answer, but a persisted computed column is a REALLY good way to handle these cases where you need cleansed or transformed data almost all the time, but need to preserve the original garbage. His suggestion is relevant and appropriate REGARDLESS of how you decide to cleanse your data.
我不知道为什么 Charles Bretana 删除了他的答案,所以我将它作为 CW 答案重新添加进来,但是持久化计算列是处理这些几乎所有时间都需要清理或转换数据的情况的好方法,但需要保留原来的垃圾。无论您决定如何清理数据,他的建议都是相关且适当的。
Specifically, in my current project, I have a persisted computed column which trims all the leading zeros (luckily this is realtively easily handled in straight T-SQL) from some particular numeric identifiers stored inconsistently with leading zeros. This is stored in persisted computed columns in the tables which need it and indexed because that conformed identifier is often used in joins.
具体来说,在我当前的项目中,我有一个持久化计算列,它从一些与前导零不一致存储的特定数字标识符中修剪所有前导零(幸运的是,这在直接 T-SQL 中很容易处理)。这存储在需要它并索引的表中的持久计算列中,因为该一致标识符通常用于连接。