如何在 SQL Server 2005 表的 NTEXT 字段中找到 Unicode/非 ASCII 字符？

Question

提问by TheSoftwareJedi

I have a table with a couple thousand rows. The description and summary fields are NTEXT, and sometimes have non-ASCII chars in them. How can I locate all of the rows with non ASCII characters?

我有一张有几千行的桌子。描述和摘要字段是 NTEXT，有时其中包含非 ASCII 字符。如何找到所有包含非 ASCII 字符的行？

Answer 1

回答by CC1960

I have sometimes been using this "cast" statement to find "strange" chars

我有时一直在使用这个“cast”语句来查找“奇怪”的字符

select 
    *
from 
    <Table>
where 
    <Field> != cast(<Field> as varchar(1000))

Answer 2

回答by Andomar

First build a string with all the characters you're not interested in (the example uses the 0x20 - 0x7F range, or 7 bits without the control characters.) Each character is prefixed with |, for use in the escape clause later.

首先用您不感兴趣的所有字符构建一个字符串（该示例使用 0x20 - 0x7F 范围，或不带控制字符的 7 位。）每个字符都以 | 为前缀，以便稍后在转义子句中使用。

-- Start with tab, line feed, carriage return
declare @str varchar(1024)
set @str = '|' + char(9) + '|' + char(10) + '|' + char(13)

-- Add all normal ASCII characters (32 -> 127)
declare @i int
set @i = 32
while @i <= 127
    begin
    -- Uses | to escape, could be any character
    set @str = @str + '|' + char(@i)
    set @i = @i + 1
    end

The next snippet searches for any character that is not in the list. The % matches 0 or more characters. The [] matches one of the characters inside the [], for example [abc] would match either a, b or c. The ^ negates the list, for example [^abc] would match anything that's not a, b, or c.

下一个片段搜索不在列表中的任何字符。% 匹配 0 个或多个字符。[] 匹配 [] 中的一个字符，例如 [abc] 将匹配 a、b 或 c。^ 否定列表，例如 [^abc] 将匹配任何不是 a、b 或 c 的内容。

select *
from yourtable
where yourfield like '%[^' + @str + ']%' escape '|'

The escape character is required because otherwise searching for characters like ], % or _ would mess up the LIKE expression.

转义字符是必需的，否则搜索 ]、% 或 _ 等字符会弄乱 LIKE 表达式。

Hope this is useful, and thanks to JohnFX's comment on the other answer.

希望这是有用的，并感谢 JohnFX 对另一个答案的评论。

Answer 3

回答by petejamd

Here ya go:

给你：

SELECT *
FROM Objects
WHERE 
    ObjectKey LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?@\[\^_`{|}~\]\]%' ESCAPE '\'

Answer 4

回答by RBarryYoung

Technically, I believe that an NCHAR(1) is a valid ASCII character IF & Only IF UNICODE(@NChar) < 256 and ASCII(@NChar) = UNICODE(@NChar) though that may not be exactly what you intended. Therefore this would be a correct solution:

从技术上讲，我相信 NCHAR(1) 是有效的 ASCII 字符 IF & Only IF UNICODE(@NChar) < 256 和 ASCII(@NChar) = UNICODE(@NChar) 尽管这可能不是您想要的。因此，这将是一个正确的解决方案：

;With cteNumbers as
(
    Select ROW_NUMBER() Over(Order By c1.object_id) as N
    From sys.system_columns c1, sys.system_columns c2
)
Select Distinct RowID
From YourTable t
    Join cteNumbers n ON n <= Len(CAST(TXT As NVarchar(MAX)))
Where UNICODE(Substring(TXT, n.N, 1)) > 255
    OR UNICODE(Substring(TXT, n.N, 1)) <> ASCII(Substring(TXT, n.N, 1))

This should also be very fast.

这也应该非常快。

Answer 5

回答by Chris Shaffer

It's probably not the best solution, but maybe a query like:

这可能不是最好的解决方案，但可能是这样的查询：

SELECT *
FROM yourTable
WHERE yourTable.yourColumn LIKE '%[^0-9a-zA-Z]%'

Replace the "0-9a-zA-Z" expression with something that captures the full ASCII set (or a subset that your data contains).

用捕获完整 ASCII 集（或数据包含的子集）的内容替换“0-9a-zA-Z”表达式。

Answer 6

回答by neuracnu

I started with @CC1960's solution but found an interesting use case that caused it to fail. It seems that SQL Server will equate certain Unicode characters to their non-Unicode approximations. For example, SQL Server considers the Unicode character "fullwidth comma" (http://www.fileformat.info/info/unicode/char/ff0c/index.htm) the same as a standard ASCII comma when compared in a WHERE clause.

我从@CC1960 的解决方案开始，但发现一个有趣的用例导致它失败。似乎 SQL Server 会将某些 Unicode 字符等同于它们的非 Unicode 近似值。例如，在 WHERE 子句中进行比较时，SQL Server 认为 Unicode 字符“全角逗号”（http://www.fileformat.info/info/unicode/char/ff0c/index.htm）与标准 ASCII 逗号相同。

To get around this, have SQL Server compare the strings as binary. But remember, nvarchar and varchar binaries don't match up (16-bit vs 8-bit), so you need to convert your varchar back up to nvarchar again before doing the binary comparison:

要解决此问题，请让 SQL Server 将字符串作为二进制进行比较。但请记住，nvarchar 和 varchar 二进制文件不匹配（16 位与 8 位），因此您需要在进行二进制比较之前再次将 varchar 转换回 nvarchar：

select *
from my_table
where CONVERT(binary(5000),my_table.my_column) != CONVERT(binary(5000),CONVERT(nvarchar(1000),CONVERT(varchar(1000),my_table.my_column)))

Answer 7

回答by Mohan Billakanti

If you are looking for a specific unicode character, you might use something like below.

如果您正在寻找特定的 unicode 字符，您可以使用如下所示的内容。

   select  Fieldname from 
     (
      select Fieldname,
             REPLACE(Fieldname COLLATE Latin1_General_BIN,
             NCHAR(65533) COLLATE Latin1_General_BIN,
             'CustomText123') replacedcol
      from table
     ) results where results.replacedcol like '%CustomText123%'

Answer 8

回答by Tom H

My previous answer was confusing UNICODE/non-UNICODE data. Here is a solution that should work for all situations, although I'm still running into some anomalies. It seems like certain non-ASCII unicode characters for superscript characters are being confused with the actual number character. You might be able to play around with collations to get around that.

我之前的回答是混淆了 UNICODE/非 UNICODE 数据。这是一个适用于所有情况的解决方案，尽管我仍然遇到一些异常情况。上标字符的某些非 ASCII unicode 字符似乎与实际数字字符混淆。您可能可以使用排序规则来解决这个问题。

Hopefully you already have a numbers table in your database (they can be very useful), but just in case I've included the code to partially fill that as well.

希望您的数据库中已经有一个数字表（它们可能非常有用），但以防万一我也包含了部分填充它的代码。

You also might need to play around with the numeric range, since unicode characters can go beyond 255.

您可能还需要处理数字范围，因为 unicode 字符可以超过 255。

CREATE TABLE dbo.Numbers
(
    number  INT NOT NULL,
    CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (number)
)
GO
DECLARE @i INT

SET @i = 0

WHILE @i < 1000
BEGIN
    INSERT INTO dbo.Numbers (number) VALUES (@i)

    SET @i = @i + 1
END
GO

SELECT *,
    T.ID, N.number, N'%' + NCHAR(N.number) + N'%'
FROM
    dbo.Numbers N
INNER JOIN dbo.My_Table T ON
    T.description LIKE N'%' + NCHAR(N.number) + N'%' OR
    T.summary LIKE N'%' + NCHAR(N.number) + N'%'
and t.id = 1
WHERE
    N.number BETWEEN 127 AND 255
ORDER BY
    T.id, N.number
GO

Answer 9

回答by Paul Harrington

-- This is a very, very inefficient way of doing it but should be OK for -- small tables. It uses an auxiliary table of numbers as per Itzik Ben-Gan and simply -- looks for characters with bit 7 set.

-- 这是一种非常非常低效的方法，但对于 -- 小表来说应该没问题。它根据 Itzik Ben-Gan 使用辅助数字表，并简单地 - 查找设置了第 7 位的字符。

SELECT  *
FROM    yourTable as t
WHERE   EXISTS ( SELECT *
                 FROM   msdb..Nums as NaturalNumbers
                 WHERE  NaturalNumbers.n < LEN(t.string_column)
                        AND ASCII(SUBSTRING(t.string_column, NaturalNumbers.n, 1)) > 127)

如何在 SQL Server 2005 表的 NTEXT 字段中找到 Unicode/非 ASCII 字符？

提问by TheSoftwareJedi

回答by CC1960

回答by Andomar

回答by petejamd

回答by RBarryYoung

回答by Chris Shaffer

回答by neuracnu

回答by Mohan Billakanti

回答by Tom H

回答by Paul Harrington

相关推荐

最近更新

标签

如何在 SQL Server 2005 表的 NTEXT 字段中找到 Unicode/非 ASCII 字符？

提问by TheSoftwareJedi

回答by CC1960

回答by Andomar

回答by petejamd

回答by RBarryYoung

回答by Chris Shaffer

回答by neuracnu

回答by Mohan Billakanti

回答by Tom H

回答by Paul Harrington

相关推荐

SQL Server 中的用户定义数据类型有多酷？

SQL 想在SQLPLUS中一次性运行多个SQL脚本文件

SQL 如何在 Oracle 中使用 Timestamp_to_scn 和 Scn_to_timestamp？

SQL Server - 停止或中断 SQL 脚本的执行

相关推荐

最近更新

标签