如何使用 SQL 确定 Oracle DB 字段中的字符是否在 UTF8 字符集中但在 LATN-1 之外?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27279518/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to determine if characters in an Oracle DB field are within the UTF8 charset but outside of LATN-1 with SQL?
提问by dave
I currently have the need to write a SQL query to determine the number of rows in my UTF8 Oracle database that are not compatible with another system that uses LATIN-1.
我目前需要编写一个 SQL 查询来确定我的 UTF8 Oracle 数据库中与使用 LATIN-1 的另一个系统不兼容的行数。
For example, über
should not return a result, but 翻译
should
例如,über
不应该返回结果,但翻译
应该
I have tried queries such as:
我尝试过以下查询:
select decode(convert(convert('über test', 'WE8ISO8859P1'), 'UTF8'), convert('über test', 'UTF8'), 1, 0) from dual;
However, this does not give me the result that I need. Can anyone provide a SQL-only solution to this problem? Thanks
但是,这并没有给我我需要的结果。任何人都可以为这个问题提供一个仅限 SQL 的解决方案吗?谢谢
采纳答案by dave
I think I have figured it out:
我想我已经想通了:
select * from (select asciistr(convert('test string goes here', 'UTF8')) as str from dual) where regexp_like(str, '.*\\([1-9A-F]|0[1-9A-F]).*');
select * from (select asciistr(convert('test string goes here', 'UTF8')) as str from dual) where regexp_like(str, '.*\\([1-9A-F]|0[1-9A-F]).*');
Using http://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29as a reference, the LATIN-1 block of unicode ends at \00FF.
使用http://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29作为参考,Unicode 的 LATIN-1 块以 \00FF 结束。
For example,
例如,
SQL> select * from (select asciistr(convert('翻译', 'UTF8')) as str from dual) where regexp_like(str, '.*\([1-9A-F]|0[1-9A-F]).*');
STR
------------------------------
FFBBD1
If someone could double-check this from a logical standpoint, I would appreciate it.
如果有人可以从逻辑的角度仔细检查这一点,我将不胜感激。
回答by Mark J. Bobak
You didn't bother to mention an Oracle version. Up to 11.2, you should use the Oracle provided character set scanner (CSSCAN) for this purpose. Starting with 12.1, there's new utility called Oracle Database Migration Assistant for Unicode.
您没有费心提及 Oracle 版本。在 11.2 之前,您应该为此使用 Oracle 提供的字符集扫描器 (CSSCAN)。从 12.1 开始,有一个名为 Oracle Database Migration Assistant for Unicode 的新实用程序。