如何在 MySQL 中找到非 ASCII 字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/401771/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I find non-ASCII characters in MySQL?
提问by Ed Mays
I'm working with a MySQL database that has some data imported from Excel. The data contains non-ASCIIcharacters (em dashes, etc.) as well as hidden carriage returns or line feeds. Is there a way to find these records using MySQL?
我正在使用一个 MySQL 数据库,它有一些从Excel导入的数据。数据包含非ASCII字符(破折号等)以及隐藏的回车或换行符。有没有办法使用 MySQL 查找这些记录?
采纳答案by Chad Birch
It depends exactly what you're defining as "ASCII", but I would suggest trying a variant of a query like this:
这完全取决于您定义为“ASCII”的内容,但我建议尝试这样的查询变体:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9]';
That query will return all rows where columnToCheck contains any non-alphanumeric characters. If you have other characters that are acceptable, add them to the character class in the regular expression. For example, if periods, commas, and hyphens are OK, change the query to:
该查询将返回 columnToCheck 包含任何非字母数字字符的所有行。如果您有其他可接受的字符,请将它们添加到正则表达式中的字符类中。例如,如果句号、逗号和连字符都可以,则将查询更改为:
SELECT * FROM tableName WHERE columnToCheck NOT REGEXP '[A-Za-z0-9.,-]';
The most relevant page of the MySQL documentation is probably 12.5.2 Regular Expressions.
MySQL 文档中最相关的页面可能是12.5.2 Regular Expressions。
回答by O. Jones
MySQL provides comprehensive character set management that can help with this kind of problem.
MySQL 提供了全面的字符集管理,可以帮助解决此类问题。
SELECT whatever
FROM tableName
WHERE columnToCheck <> CONVERT(columnToCheck USING ASCII)
The CONVERT(col USING charset)
function turns the unconvertable characters into replacement characters. Then, the converted and unconverted text will be unequal.
该CONVERT(col USING charset)
函数将不可转换的字符转换为替换字符。然后,转换后的和未转换的文本将是不相等的。
See this for more discussion. https://dev.mysql.com/doc/refman/8.0/en/charset-reperttheitroade.html
有关更多讨论,请参阅此内容。https://dev.mysql.com/doc/refman/8.0/en/charset-reperttheitroade.html
You can use any character set name you wish in place of ASCII. For example, if you want to find out which characters won't render correctly in code page 1257 (Lithuanian, Latvian, Estonian) use CONVERT(columnToCheck USING cp1257)
您可以使用任何您希望的字符集名称来代替 ASCII。例如,如果您想找出代码页 1257(立陶宛语、拉脱维亚语、爱沙尼亚语)中哪些字符无法正确呈现,请使用CONVERT(columnToCheck USING cp1257)
回答by zende
You can define ASCII as all characters that have a decimal value of 0 - 127 (0x00 - 0x7F) and find columns with non-ASCII characters using the following query
您可以将 ASCII 定义为十进制值为 0 - 127 (0x00 - 0x7F) 的所有字符,并使用以下查询查找具有非 ASCII 字符的列
SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';
This was the most comprehensive query I could come up with.
这是我能想到的最全面的查询。
回答by Chad Birch
This is probably what you're looking for:
这可能是你正在寻找的:
select * from TABLE where COLUMN regexp '[^ -~]';
It should return all rows where COLUMN contains non-ASCII characters (or non-printable ASCII characters such as newline).
它应该返回 COLUMN 包含非 ASCII 字符(或不可打印的 ASCII 字符,例如换行符)的所有行。
回答by Rob Bailey
One missing character from everyone's examples above is the termination character (\0). This is invisible to the MySQL console output and is not discoverable by any of the queries heretofore mentioned. The query to find it is simply:
上面每个人的示例中缺少的一个字符是终止字符 (\0)。这对 MySQL 控制台输出是不可见的,并且无法被迄今为止提到的任何查询发现。查找它的查询很简单:
select * from TABLE where COLUMN like '%SELECT * FROM `table` WHERE NOT `field` REGEXP "[\x00-\xFF]|^$";
%';
回答by Mahmoud Al-Qudsi
Based on the correct answer, but taking into account ASCII control characters as well, the solution that worked for me is this:
基于正确答案,但也考虑到 ASCII 控制字符,对我有用的解决方案是:
SELECT * FROM `table` WHERE `field` <> "" AND NOT `field` REGEXP "[\x00-\xFF]";
It does the same thing: searches for violations of the ASCII range in a column, but lets you search for control characters too, since it uses hexadecimal notation for code points. Since there is no comparison or conversion (unlike @Ollie's answer), this should be significantly faster, too. (Especially if MySQL does early-termination on the regex query, which it definitely should.)
它做同样的事情:在列中搜索违反 ASCII 范围的情况,但也允许您搜索控制字符,因为它使用十六进制表示法来表示代码点。由于没有比较或转换(与@Ollie 的回答不同),这也应该快得多。(特别是如果 MySQL 提前终止正则表达式查询,它绝对应该这样做。)
It also avoids returning fields that are zero-length. If you want a slightly-longer version that might perform better, you can use this instead:
它还避免返回零长度的字段。如果您想要一个可能性能更好的稍长版本,您可以使用它:
SELECT *
FROM tableName
WHERE fieldName REGEXP '[^a-zA-Z0-9@:. \'\-`,\&]'
It does a separate check for length to avoid zero-length results, without considering them for a regex pass. Depending on the number of zero-length entries you have, this could be significantly faster.
它单独检查长度以避免零长度结果,而不考虑它们进行正则表达式传递。根据您拥有的零长度条目的数量,这可能会快得多。
Note that if your default character set is something bizarre where 0x00-0xFF don't map to the same values as ASCII (is there such a character set in existence anywhere?), this would return a false positive. Otherwise, enjoy!
请注意,如果您的默认字符集很奇怪,其中 0x00-0xFF 没有映射到与 ASCII 相同的值(任何地方都存在这样的字符集?),这将返回误报。否则,享受吧!
回答by Sachin
Try Using this query for searching special character records
尝试使用此查询来搜索特殊字符记录
SELECT * FROM `table` WHERE NOT `column` REGEXP '^[ -~]+$' AND `column` !=''
回答by chiliNUT
@zende's answer was the only one that covered columns with a mix of ascii and non ascii characters, but it also had that problematic hex thing. I used this:
@zende 的答案是唯一一个包含 ascii 和非 ascii 字符混合列的答案,但它也有十六进制问题。我用过这个:
SELECT * FROM TABLE_A WHERE ASCIISTR(COLUMN_A) <> COLUMN_A;
回答by Malaka Gunawardhana
In Oracle we can use below.
在Oracle中我们可以在下面使用。
##代码##回答by hemu123
for this question we can also use this method :
对于这个问题,我们也可以使用这种方法:
Question from sql zoo:
Find all details of the prize won by PETER GRüNBERG
来自 sql zoo 的问题:
查找 PETER GRüNBERG 获奖的所有详细信息
Non-ASCII characters
非 ASCII 字符
ans: select*from nobel where winner like'P% GR%_%berg';
ans: select*from nobel where 获奖者喜欢'P% GR%_%berg';