如何在 SQL Server 2005 表的 NTEXT 字段中找到 Unicode/非 ASCII 字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/686967/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I find Unicode/non-ASCII characters in an NTEXT field in a SQL Server 2005 table?
提问by TheSoftwareJedi
I have a table with a couple thousand rows. The description and summary fields are NTEXT, and sometimes have non-ASCII chars in them. How can I locate all of the rows with non ASCII characters?
我有一张有几千行的桌子。描述和摘要字段是 NTEXT,有时其中包含非 ASCII 字符。如何找到所有包含非 ASCII 字符的行?
回答by CC1960
I have sometimes been using this "cast" statement to find "strange" chars
我有时一直在使用这个“cast”语句来查找“奇怪”的字符
select
*
from
<Table>
where
<Field> != cast(<Field> as varchar(1000))
回答by Andomar
First build a string with all the characters you're not interested in (the example uses the 0x20 - 0x7F range, or 7 bits without the control characters.) Each character is prefixed with |, for use in the escape clause later.
首先用您不感兴趣的所有字符构建一个字符串(该示例使用 0x20 - 0x7F 范围,或不带控制字符的 7 位。)每个字符都以 | 为前缀,以便稍后在转义子句中使用。
-- Start with tab, line feed, carriage return
declare @str varchar(1024)
set @str = '|' + char(9) + '|' + char(10) + '|' + char(13)
-- Add all normal ASCII characters (32 -> 127)
declare @i int
set @i = 32
while @i <= 127
begin
-- Uses | to escape, could be any character
set @str = @str + '|' + char(@i)
set @i = @i + 1
end
The next snippet searches for any character that is not in the list. The % matches 0 or more characters. The [] matches one of the characters inside the [], for example [abc] would match either a, b or c. The ^ negates the list, for example [^abc] would match anything that's not a, b, or c.
下一个片段搜索不在列表中的任何字符。% 匹配 0 个或多个字符。[] 匹配 [] 中的一个字符,例如 [abc] 将匹配 a、b 或 c。^ 否定列表,例如 [^abc] 将匹配任何不是 a、b 或 c 的内容。
select *
from yourtable
where yourfield like '%[^' + @str + ']%' escape '|'
The escape character is required because otherwise searching for characters like ], % or _ would mess up the LIKE expression.
转义字符是必需的,否则搜索 ]、% 或 _ 等字符会弄乱 LIKE 表达式。
Hope this is useful, and thanks to JohnFX's comment on the other answer.
希望这是有用的,并感谢 JohnFX 对另一个答案的评论。
回答by petejamd
Here ya go:
给你:
SELECT *
FROM Objects
WHERE
ObjectKey LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?@\[\^_`{|}~\]\]%' ESCAPE '\'
回答by RBarryYoung
Technically, I believe that an NCHAR(1) is a valid ASCII character IF & Only IF UNICODE(@NChar) < 256 and ASCII(@NChar) = UNICODE(@NChar) though that may not be exactly what you intended. Therefore this would be a correct solution:
从技术上讲,我相信 NCHAR(1) 是有效的 ASCII 字符 IF & Only IF UNICODE(@NChar) < 256 和 ASCII(@NChar) = UNICODE(@NChar) 尽管这可能不是您想要的。因此,这将是一个正确的解决方案:
;With cteNumbers as
(
Select ROW_NUMBER() Over(Order By c1.object_id) as N
From sys.system_columns c1, sys.system_columns c2
)
Select Distinct RowID
From YourTable t
Join cteNumbers n ON n <= Len(CAST(TXT As NVarchar(MAX)))
Where UNICODE(Substring(TXT, n.N, 1)) > 255
OR UNICODE(Substring(TXT, n.N, 1)) <> ASCII(Substring(TXT, n.N, 1))
This should also be very fast.
这也应该非常快。
回答by Chris Shaffer
It's probably not the best solution, but maybe a query like:
这可能不是最好的解决方案,但可能是这样的查询:
SELECT *
FROM yourTable
WHERE yourTable.yourColumn LIKE '%[^0-9a-zA-Z]%'
Replace the "0-9a-zA-Z" expression with something that captures the full ASCII set (or a subset that your data contains).
用捕获完整 ASCII 集(或数据包含的子集)的内容替换“0-9a-zA-Z”表达式。
回答by neuracnu
I started with @CC1960's solution but found an interesting use case that caused it to fail. It seems that SQL Server will equate certain Unicode characters to their non-Unicode approximations. For example, SQL Server considers the Unicode character "fullwidth comma" (http://www.fileformat.info/info/unicode/char/ff0c/index.htm) the same as a standard ASCII comma when compared in a WHERE clause.
我从@CC1960 的解决方案开始,但发现一个有趣的用例导致它失败。似乎 SQL Server 会将某些 Unicode 字符等同于它们的非 Unicode 近似值。例如,在 WHERE 子句中进行比较时,SQL Server 认为 Unicode 字符“全角逗号”(http://www.fileformat.info/info/unicode/char/ff0c/index.htm)与标准 ASCII 逗号相同。
To get around this, have SQL Server compare the strings as binary. But remember, nvarchar and varchar binaries don't match up (16-bit vs 8-bit), so you need to convert your varchar back up to nvarchar again before doing the binary comparison:
要解决此问题,请让 SQL Server 将字符串作为二进制进行比较。但请记住,nvarchar 和 varchar 二进制文件不匹配(16 位与 8 位),因此您需要在进行二进制比较之前再次将 varchar 转换回 nvarchar:
select *
from my_table
where CONVERT(binary(5000),my_table.my_column) != CONVERT(binary(5000),CONVERT(nvarchar(1000),CONVERT(varchar(1000),my_table.my_column)))
回答by Mohan Billakanti
If you are looking for a specific unicode character, you might use something like below.
如果您正在寻找特定的 unicode 字符,您可以使用如下所示的内容。
select Fieldname from
(
select Fieldname,
REPLACE(Fieldname COLLATE Latin1_General_BIN,
NCHAR(65533) COLLATE Latin1_General_BIN,
'CustomText123') replacedcol
from table
) results where results.replacedcol like '%CustomText123%'
回答by Tom H
My previous answer was confusing UNICODE/non-UNICODE data. Here is a solution that should work for all situations, although I'm still running into some anomalies. It seems like certain non-ASCII unicode characters for superscript characters are being confused with the actual number character. You might be able to play around with collations to get around that.
我之前的回答是混淆了 UNICODE/非 UNICODE 数据。这是一个适用于所有情况的解决方案,尽管我仍然遇到一些异常情况。上标字符的某些非 ASCII unicode 字符似乎与实际数字字符混淆。您可能可以使用排序规则来解决这个问题。
Hopefully you already have a numbers table in your database (they can be very useful), but just in case I've included the code to partially fill that as well.
希望您的数据库中已经有一个数字表(它们可能非常有用),但以防万一我也包含了部分填充它的代码。
You also might need to play around with the numeric range, since unicode characters can go beyond 255.
您可能还需要处理数字范围,因为 unicode 字符可以超过 255。
CREATE TABLE dbo.Numbers
(
number INT NOT NULL,
CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (number)
)
GO
DECLARE @i INT
SET @i = 0
WHILE @i < 1000
BEGIN
INSERT INTO dbo.Numbers (number) VALUES (@i)
SET @i = @i + 1
END
GO
SELECT *,
T.ID, N.number, N'%' + NCHAR(N.number) + N'%'
FROM
dbo.Numbers N
INNER JOIN dbo.My_Table T ON
T.description LIKE N'%' + NCHAR(N.number) + N'%' OR
T.summary LIKE N'%' + NCHAR(N.number) + N'%'
and t.id = 1
WHERE
N.number BETWEEN 127 AND 255
ORDER BY
T.id, N.number
GO
回答by Paul Harrington
-- This is a very, very inefficient way of doing it but should be OK for -- small tables. It uses an auxiliary table of numbers as per Itzik Ben-Gan and simply -- looks for characters with bit 7 set.
-- 这是一种非常非常低效的方法,但对于 -- 小表来说应该没问题。它根据 Itzik Ben-Gan 使用辅助数字表,并简单地 - 查找设置了第 7 位的字符。
SELECT *
FROM yourTable as t
WHERE EXISTS ( SELECT *
FROM msdb..Nums as NaturalNumbers
WHERE NaturalNumbers.n < LEN(t.string_column)
AND ASCII(SUBSTRING(t.string_column, NaturalNumbers.n, 1)) > 127)