oracle CHAR 语义和 ORA-01461

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5230346/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 22:57:28  来源:igfitidea点击:

CHAR semantics and ORA-01461

phporacleoci8

提问by álvaro González

I maintain a PHP driven application with Oracle backend (OCI8 functions). The app is developed with Oracle 10g XE and deployed on whatever version the customer owns.

我使用 Oracle 后端(OCI8 函数)维护一个 PHP 驱动的应用程序。该应用程序是使用 Oracle 10g XE 开发的,并部署在客户拥有的任何版本上。

The application handles single-byte text (ISO-8859-15) and I've never had any problem while developing against the Western Europeanedition of Oracle XE. However, I've recently installed the Universaledition and I'm having issues when inserting large strings with non-ASCII chars. This version sets NLS_CHARACTERSET = AL32UTF8; since I my app uses WE8ISO8859P15Oracle silently converts my input data from ISO-8859-15 to UTF-8 (which is fine). But it seems that certain size checks go wrong: a string with 1500 characters (1500 bytes in ISO-8889-15, 4500 bytes in UTF-8) appear to overflow a VARCHAR2(4000 CHAR)column.

该应用程序处理单字节文本 (ISO-8859-15),在针对西欧版 Oracle XE进行开发时,我从未遇到任何问题。但是,我最近安装了通用版,并且在插入带有非 ASCII 字符的大字符串时遇到了问题。这个版本设置NLS_CHARACTERSET = AL32UTF8;因为我的应用程序使用WE8ISO8859P15Oracle 默默地将我的输入数据从 ISO-8859-15 转换为 UTF-8(这很好)。但似乎某些大小检查出错了:一个包含 1500 个字符(ISO-8889-15 中为 1500 个字节,UTF-8 中为 4500 个字节)的字符串似乎溢出了一VARCHAR2(4000 CHAR)列。

I've created this test table:

我创建了这个测试表:

CREATE TABLE FOO (
    FOO_ID NUMBER NOT NULL ENABLE,
    DATA_BYTE VARCHAR2(4000 BYTE),
    DATA_CHAR VARCHAR2(4000 CHAR),

    CONSTRAINT FOO_PK PRIMARY KEY (FOO_ID)
);

The problem can be reproduced with this code:

可以使用以下代码重现该问题:

<?php
$connection = oci_connect(DB_USER, DB_PASS, DB_CONN_STRING, 'WE8ISO8859P15');
if( !$connection ){
    $e = oci_error();
    die(htmlspecialchars($e['message']));
}

$id = 1;
$data = str_repeat('', 1500);

$sql = 'INSERT INTO FOO (FOO_ID, DATA_CHAR) ' .
    'VALUES (:id, :data)';
$res = oci_parse($connection, $sql);
if(!$res){
    $e = oci_error();
    die(htmlspecialchars($e['message']));
}
if(!oci_bind_by_name($res, ':id', $id)){
    $e = oci_error();
    die(htmlspecialchars($e['message']));
}
if(!oci_bind_by_name($res, ':data', $data)){
    $e = oci_error();
    die(htmlspecialchars($e['message']));
}
if(!oci_execute($res, OCI_COMMIT_ON_SUCCESS)){
    $e = oci_error();
    die(htmlspecialchars($e['message']));
}

... which triggers:

...触发:

Warning: oci_execute(): ORA-01461: sólo puede enlazar un valor LONG para insertarlo en una columna LONG

警告:oci_execute():ORA-01461:sólo puede enlazar un valor LONG para insertarlo en una columna LONG

It is the same error I get when I try to insert a 4001 char string. It doesn't happen if I insert xxx...instead of and it doesn't happen if I save my script as UTF-8 and connect as such:

这与我尝试插入 4001 字符字符串时遇到的错误相同。如果我插入xxx...而不是如果我将脚本保存为 UTF-8 并按如下方式连接,则不会发生这种情况:

<?php
$connection = oci_connect(DB_USER, DB_PASS, DB_CONN_STRING, 'AL32UTF8');

[Update:My test was flawed. Using UTF-8 doesn't avoid ORA-01461]

[更新:我的测试有缺陷。使用 UTF-8 并不能避免 ORA-01461]

How can I override this problem? The NLS_CHARACTERSET database parameter is not something I control and switching my app to UTF-8 is likely to cause other problems (almost all our customers have single byte databases).

我该如何解决这个问题?NLS_CHARACTERSET 数据库参数不是我控制的将我的应用程序切换到 UTF-8 可能会导致其他问题(几乎我们所有的客户都有单字节数据库)。

回答by Justin Cave

This is probably not something that you can work around unless you want to use a CLOB instead of a VARCHAR2.

这可能不是您可以解决的问题,除非您想使用 CLOB 而不是 VARCHAR2。

In Oracle, when you declare a column, the default is to use byte-length semantics. So a VARCHAR2(100), for example, allocates 100 bytes of storage. If you're using a single-byte character set like ISO 8859-1, every character requires 1 byte of storage, so this also allocates space for 100 characters. But if you are using a multi-byte character set like UFT-8, each character can require between 1 and 4 bytes of storage. Depending on the data, therefore, a VARCHAR2(100) may only be able to store 25 characters of data (English characters generally require 1 byte, European characters generally require 2 bytes, and Asian characters generally require 3 bytes).

在 Oracle 中,当您声明一个列时,默认是使用字节长度语义。例如,VARCHAR2(100) 分配 100 字节的存储空间。如果您使用像 ISO 8859-1 这样的单字节字符集,则每个字符都需要 1 个字节的存储空间,因此这也为 100 个字符分配了空间。但是,如果您使用像 UFT-8 这样的多字节字符集,则每个字符可能需要 1 到 4 个字节的存储空间。因此,根据数据的不同,一个 VARCHAR2(100) 可能只能存储 25 个字符的数据(英文字符一般需要 1 个字节,欧洲字符一般需要 2 个字节,亚洲字符一般需要 3 个字节)。

You can tell Oracle to use character length semantics which is normally what I'd suggest when moving from an ISO-8859-1 database to a UTF-8 database. If you declare a column VARCHAR2(100 CHAR), Oracle will allocate space for 100 characters regardless of whether that ends up being 100 bytes or 400 bytes. You can also set the NLS_LENGTH_SEMANTICS parameter to CHAR to change the default (for new DDL) so that a VARCHAR2(100) allocates 100 characters of storage rather than 100 bytes.

您可以告诉 Oracle 使用字符长度语义,这通常是我从 ISO-8859-1 数据库转移到 UTF-8 数据库时所建议的。如果您声明列 VARCHAR2(100 CHAR),Oracle 将为 100 个字符分配空间,无论最终是 100 字节还是 400 字节。您还可以将 NLS_LENGTH_SEMANTICS 参数设置为 CHAR 以更改默认值(对于新 DDL),以便 VARCHAR2(100) 分配 100 个字符的存储空间,而不是 100 个字节。

Unfortunately for you, though, the limit on the size of an Oracle VARCHAR2 (in the context of the SQL engine rather than the PL/SQL engine) is 4000 bytes. So even if you declare a column VARCHAR2(4000 CHAR), you're still going to be limited to actually inserting 4000 bytes of data which may be as few as 1000 characters. For example, in a database using the AL32UTF8 character set, I can declare a column VARCHAR2(4000 CHAR) but inserting a character that requires 2 bytes of storage shows that I can't really insert 4000 characters of data

但是,不幸的是,Oracle VARCHAR2 的大小限制(在 SQL 引擎而不是 PL/SQL 引擎的上下文中)是 4000 字节。因此,即使您声明一列 VARCHAR2(4000 CHAR),您仍将被限制为实际插入 4000 字节的数据,这些数据可能少至 1000 个字符。例如,在使用 AL32UTF8 字符集的数据库中,我可以声明一个列 VARCHAR2(4000 CHAR) 但插入一个需要 2 个字节存储的字符表明我无法真正插入 4000 个字符的数据

SQL> create table foo (
  2    col1 varchar2(4000 char)
  3  );

Table created.

SQL> insert into foo values( rpad( 'abcde', 4000, unistr('##代码##f6') ) );

1 row created.

SQL> ed
Wrote file afiedt.buf

  1* insert into foo values( rpad( 'abcde', 6000, unistr('##代码##f6') ) )
SQL> /

1 row created.

SQL> select length(col1), lengthb(col1)
  2    from foo;

LENGTH(COL1) LENGTHB(COL1)
------------ -------------
        2003          4000
        2003          4000

If you need to store 4000 characters of UTF-8 data, you'd need a data type that could handle 16000 bytes which would necessitate moving to a CLOB.

如果您需要存储 4000 个字符的 UTF-8 数据,您需要一种可以处理 16000 个字节的数据类型,这将需要移动到 CLOB。