如何在 MySQL 中找到非 ASCII 字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/401771/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:33:50  来源:igfitidea点击:

How can I find non-ASCII characters in MySQL?

mysqlcharacter-encoding

提问by Ed Mays

I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCIIcharacters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?

我正在使用一个 MySQL 数据库,它有一些从Excel导入的数据。数据包含非ASCII字符(破折号等)以及隐藏的回车或换行符。有没有办法使用 MySQL 查找这些记录?

采纳答案by Chad Birch

It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:

这完全取决于您定义为“ASCII”的内容,但我建议尝试这样的查询变体:

SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';

That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:

该查询将返回 columnToCheck 包含任何非字母数字字符的所有行。如果您有其他可接受的字符,请将它们添加到正则表达式中的字符类中。例如,如果句号、逗号和连字符都可以,则将查询更改为:

SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';

The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.

MySQL 文档中最相关的页面可能是12.5.2 Regular Expressions

回答by O. Jones

MySQL provides comprehensive character set management that can help with this kind of problem.

MySQL 提供了全面的字符集管理,可以帮助解决此类问题。

SELECT whatever
  FROM tableName 
 WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)

The CONVERT(col USING charset)function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.

CONVERT(col USING charset)函数将不可转换的字符转换为替换字符。然后,转换后的和未转换的文本将是不相等的。

See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-reperttheitroade.html

有关更多讨论,请参阅此内容。https://dev.mysql.com/doc/refman/8.0/en/charset-reperttheitroade.html

You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)

您可以使用任何您希望的字符集名称来代替 ASCII。例如,如果您想找出代码页 1257(立陶宛语、拉脱维亚语、爱沙尼亚语)中哪些字符无法正确呈现,请使用CONVERT(columnToCheck USING cp1257)

回答by zende

You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query

您可以将 ASCII 定义为十进制值为 0 - 127 (0x00 - 0x7F) 的所有字符,并使用以下查询查找具有非 ASCII 字符的列

SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';

This was the most comprehensive query I could come up with.

这是我能想到的最全面的查询。

回答by Chad Birch

This is probably what you're looking for:

这可能是你正在寻找的:

select * from TABLE where COLUMN regexp '[^ -~]';

It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).

它应该返回 COLUMN 包含非 ASCII 字符(或不可打印的 ASCII 字符,例如换行符)的所有行。

回答by Rob Bailey

One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:

上面每个人的示例中缺少的一个字符是终止字符 (\0)。这对 MySQL 控制台输出是不可见的,并且无法被迄今为止提到的任何查询发现。查找它的查询很简单:

select * from TABLE where COLUMN like '%
SELECT * FROM `table` WHERE NOT `field` REGEXP  "[\x00-\xFF]|^$";
%';

回答by Mahmoud Al-Qudsi

Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:

基于正确答案,但也考虑到 ASCII 控制字符,对我有用的解决方案是:

SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP  "[\x00-\xFF]";

It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike @Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)

它做同样的事情:在列中搜索违反 ASCII 范围的情况,但也允许您搜索控制字符,因为它使用十六进制表示法来表示代码点。由于没有比较或转换(与@Ollie 的回答不同),这也应该快得多。(特别是如果 MySQL 提前终止正则表达式查询,它绝对应该这样做。)

It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:

它还避免返回零长度的字段。如果您想要一个可能性能更好的稍长版本,您可以使用它:

SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9@:. \'\-`,\&]'

It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.

它单独检查长度以避免零长度结果,而不考虑它们进行正则表达式传递。根据您拥有的零长度条目的数量,这可能会快得多。

Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!

请注意,如果您的默认字符集很奇怪,其中 0x00-0xFF 没有映射到与 ASCII 相同的值(任何地方都存在这样的字符集?),这将返回误报。否则,享受吧!

回答by Sachin

Try Using this query for searching special character records

尝试使用此查询来搜索特殊字符记录

SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''

回答by chiliNUT

@zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:

@zende 的答案是唯一一个包含 ascii 和非 ascii 字符混合列的答案,但它也有十六进制问题。我用过这个:

SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;

回答by Malaka Gunawardhana

In Oracle we can use below.

在Oracle中我们可以在下面使用。

##代码##

回答by hemu123

for this question we can also use this method :

对于这个问题,我们也可以使用这种方法:

Question from sql zoo:
Find all details of the prize won by PETER GRüNBERG

来自 sql zoo 的问题:
查找 PETER GRüNBERG 获奖的所有详细信息

Non-ASCII characters

非 ASCII 字符

ans: select*from nobel where winner like'P% GR%_%berg';

ans: select*from nobel where 获奖者喜欢'P% GR%_%berg';