Linux Oracle 设置默认 NLS_LANG

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9721751/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 05:12:40  来源:igfitidea点击:

Oracle set up default NLS_LANG

linuxoracleutf-8

提问by etharendil

I am running Oracle database 11g on RHEL6. If is no client NLS_LANG set the length of some utf8 character is 2. After NLS_LANG=AMERICAN_AMERICA.UTF8 is set the length is only 1. How can bet the default NLG_LANG changed for whole database? I don't want to change this value only for session, or like enviromental variable in linux. Is there any possibility to set it general for database?

我在 RHEL6 上运行 Oracle 数据库 11g。如果没有客户端 NLS_LANG 设置某些 utf8 字符的长度为 2。设置 NLS_LANG=AMERICAN_AMERICA.UTF8 后长度仅为 1。如何打赌整个数据库的默认 NLG_LANG 更改?我不想只为会话更改这个值,或者像 linux 中的环境变量。是否有可能将其设置为通用数据库?

SQL> select length('á') from dual;

LENGTH('??')
------------
           2

# export NLS_LANG=AMERICAN_AMERICA.UTF8

SQL> select length('á') from dual;

LENGTH('á')
-----------
          1

Many thanks for any idea

非常感谢您的任何想法

采纳答案by Vincent Malgrat

This is what probably happens:

这是可能发生的事情:

  • Your client character set must be something like CP1252 or ISO-8859-15 whereas in fact your client really uses UTF8.
  • In this character set (UTF8), the symbol átakes two bytes, so your client send these two bytes, while telling Oracle to treat these as CP1252. In CP1252 the two bytes code for two characters which result in the DB interpreting the input as two characters, therefore length('à')equals 2 (and if you insert this string the result of the insert is not equals to à)
  • When you correctly set the character set, the input is correctly treated by oracle as a single character, and its length is 1 (still two bytes).
  • 您的客户端字符集必须类似于 CP1252 或 ISO-8859-15,而实际上您的客户端确实使用 UTF8。
  • 在这个字符集 (UTF8) 中,符号á占用两个字节,因此您的客户端发送这两个字节,同时告诉 Oracle 将它们视为 CP1252。在 CP1252 中,两个字符的两个字节代码导致 DB 将输入解释为两个字符,因此length('à')等于 2(如果插入此字符串,则插入结果不等于à
  • 当您正确设置字符集时,输入被oracle正确地视为单个字符,其长度为1(仍然是两个字节)。

Conclusion: set your client character set properly or you will get translation errors (you won't get illegal characters this way but you may get strange symbols (?).

结论:正确设置你的客户端字符集,否则你会得到翻译错误(你不会以这种方式得到非法字符,但你可能会得到奇怪的符号 ( ?)。

The database character set is set at the time of creation and is generally changed via export/create blank database/import.

数据库字符集在创建时设置,通常通过导出/创建空白数据库/导入来更改。

回答by Justin Cave

Since the database character set is AL32UTF8, when you set the client NLS_LANGto UTF8, you are telling Oracle to bypass the character set conversion that normally takes place when data is transmitted between the client and the server. That is extremely dangerous because it means that if the client sends data in any other encoding, there is a substantial risk that invalid data will be stored in the database. In this case, the LENGTHcall that returns 1 is sending incorrectly encoded data to the database. Most likely, the client operating system represents data using something like the ISO-8859-1 character set where á is a single-byte character (binary value 0xE1). When the client sends the data to the database, it is sending the 0xE1 and telling the database "trust me, this is valid UTF-8 data". The database doesn't bother to check the binary data to notice that 0xE1 is not a valid binary representation of any character in the UTF-8 character set. But now if some other client comes along that is correctly configured and asks for character set translation to take place and the database has a 0xE1 stored in some column, the character set conversion code will run, identify that 0xE1 is not a valid UTF-8 character, and return a replacement character (generally '?') rather than á to the correctly configured client.

由于数据库字符集是 AL32UTF8,当您将客户端设置NLS_LANG为 UTF8 时,您是在告诉 Oracle 绕过在客户端和服务器之间传输数据时通常会发生的字符集转换。这是极其危险的,因为这意味着如果客户端以任何其他编码发送数据,无效数据将存储在数据库中的风险很大。在这种情况下,LENGTH返回 1 的调用将错误编码的数据发送到数据库。最有可能的是,客户端操作系统使用类似于 ISO-8859-1 字符集的内容来表示数据,其中 á 是一个单字节字符(二进制值 0xE1)。当客户端将数据发送到数据库时,它正在发送 0xE1 并告诉数据库“相信我,这是有效的 UTF-8 数据”。数据库不会费心检查二进制数据以注意到 0xE1 不是 UTF-8 字符集中任何字符的有效二进制表示。但是现在,如果其他一些正确配置的客户端出现并要求进行字符集转换,并且数据库在某个列中存储了 0xE1,则将运行字符集转换代码,识别 0xE1 不是有效的 UTF-8特点,

Since the database character set is UTF-8, you would expect that an 'á' character would have a length of 2. UTF-8 is a variable-width character set. Characters that are part of the US7ASCII character set are represented with 1 byte, most Western European characters such as the 'á' are represented using 2 bytes, and most Asian characters are represented using 3 bytes. There are a few rare characters that require 4 bytes.

由于数据库字符集是 UTF-8,您会期望“á”字符的长度为 2。UTF-8 是可变宽度字符集。作为 US7ASCII 字符集一部分的字符用 1 个字节表示,大多数西欧字符(如“á”)用 2 个字节表示,大多数亚洲字符用 3 个字节表示。有一些罕见的字符需要 4 个字节。

Your NLS_LANGneeds to be set to match the character set that your client system supports. It is rare that you'd have a client system that has native UTF-8 support. Since your client setting will override anything set at the database level, that means that each client needs to be configured to set an appropriate environment variable. There is a section in the NLS_LANG FAQon how to determine the correct NLS_LANGsetting for a Unix client.

NLS_LANG需要进行设置以匹配您的客户端系统支持的字符集。很少有客户端系统具有原生 UTF-8 支持。由于您的客户端设置将覆盖在数据库级别设置的任何内容,这意味着需要配置每个客户端以设置适当的环境变量。有一个在一节NLS_LANG FAQ关于如何确定正确的NLS_LANG对于Unix客户端设置