SQL ORA-29275: 部分多字节字符

Question

提问by Walker

I have input data coming from a flat file which has english, japanese, chinese characters in one column. I am loading these values in a staging table column whose schema definition is VARCHAR2(250 CHAR), the main table column has definition VARCHAR2(250) WHICH i can not change. So, i am doing a SUBSTR on this column. After loading the table when i did a

我的输入数据来自一个平面文件，该文件在一列中包含英文、日文、中文字符。我将这些值加载到架构定义为 VARCHAR2(250 CHAR) 的临时表列中，主表列的定义为 VARCHAR2(250)，我无法更改。所以，我正在这个专栏上做一个 SUBSTR。当我做了一个加载表后

SELECT * FROM TABLE

...I get this error :

...我收到此错误：

ORA-29275: partial multibyte character

ORA-29275: 部分多字节字符

If i select other columns then no issues.

如果我选择其他列，则没有问题。

Answer 1

回答by Vincent Malgrat

you should use SUBSTRBwhen you copy your data from your 250 CHARcolumn to your 250 bytecolumn. This function will only output whole characters (you won't get incomplete unicode characters):

SUBSTRB当您将数据从250 CHAR列复制到250 byte列时应该使用。这个函数只会输出整个字符（你不会得到不完整的 unicode 字符）：

SQL> select substrb('中华人', 1, 9) ch9,
  2         substrb('中华人', 1, 8) ch8,
  3         substrb('中华人', 1, 7) ch7,
  4         substrb('中华人', 1, 6) ch6,
  5         substrb('中华人', 1, 5) ch5
  6    FROM dual;

CH9       CH8      CH7     CH6    CH5
--------- -------- ------- ------ -----
中华人       中华       中华      中华     中

Edit:

编辑：

@mwardmmade an interesting comment concerning the actual length of the resulting string and whether the resulting string could contain an invalid sequence of bytes. Consider the following on an AL32UTF8 DB:

@mwardm对结果字符串的实际长度以及结果字符串是否可能包含无效的字节序列进行了有趣的评论。在 AL32UTF8 DB 上考虑以下内容：

SQL> select lengthb('???'),
  2         lengthb(substrb('??????', 1, 5)),
  3         dump('???'),
  4         dump(substrb('??????', 1, 5))
  5    FROM dual;

LE LE DUMP('???')                           DUMP(SUBSTRB('??????',1,5))
-- -- ------------------------------------- -------------------------------
 6  5 Typ=96 Len=6: 195,143,195,143,195,143 Typ=1 Len=5: 195,143,195,143,32

As you can see the last byte of the substrbstring is not the truncated first byte of the special character but encodes a legit character (The first 128 characters in this character set are the same as the ASCII7US character set so this encodes the ' 'space character, using RTRIM as suggested in another answer will remove the last character).

正如您所看到的，substrb字符串的最后一个字节不是特殊字符截断的第一个字节，而是对合法字符进行编码（此字符集中的前 128 个字符与 ASCII7US 字符集相同，因此它对' '空格字符进行编码，使用另一个答案中建议的 RTRIM 将删除最后一个字符）。

Furthermore, I also got this interesting result using the character set AL16UTF16:

此外，我还使用字符集 AL16UTF16 得到了这个有趣的结果：

SQL> select lengthb(N'??') le,
  2         dump(N'??') dump,
  3         lengthb(substrb(N'?', 1, 3)) length_substr,
  4         dump(substrb(N'??', 1, 3)) dump_substr
  5    from dual;

        LE DUMP                    LENGTH_SUBSTR DUMP_SUBSTR
---------- ----------------------- ------------- -----------------
         4 Typ=96 Len=4: 1,8,1,8               2 Typ=1 Len=2: 1,8

In this case Oracle has choosen to cut the string after the second byte because in the AL16UTF16 character set there is no legit one-byte character. The resulting string is only 2 bytes instead of 3.

在这种情况下，Oracle 选择在第二个字节之后剪切字符串，因为在 AL16UTF16 字符集中没有合法的单字节字符。结果字符串只有 2 个字节，而不是 3 个。

This would need further testing and is by no mean a rigorous demonstration but I still stand by my first hunch that substrbwill return a valid sequence of bytes that encodes a valid string of characters.

这需要进一步测试，绝不是严格的演示，但我仍然支持我的第一个预感，substrb它将返回一个有效的字节序列，该序列对有效的字符串进行编码。

Answer 2

回答by Stefan U7

I think i might have found a good way to do it if you do rtrim(substrb('中华人', 1, 8))you get '中华'and a byte length of the expected 6

我想我可能已经找到了一个很好的方法来做到这一点，如果rtrim(substrb('中华人', 1, 8))你得到'中华'了预期的 6 字节长度

please try

请尝试

Answer 3

回答by Jim Garrison

Using substr will behave differently depending on the database character set. I assume from your description that your DB character set is not one of the Unicode variants, and you must truncate the varchar2(250 char) data to 250 BYTES or less. This is dangerous because it can stop in the middle of a 2-byte character, resulting in the message you got. You should look at the documentation for substrc(), which will calculate its length based on characters and not bytes.

使用 substr 的行为会因数据库字符集而异。我从您的描述中假设您的 DB 字符集不是 Unicode 变体之一，您必须将 varchar2(250 char) 数据截断为 250 BYTES 或更少。这很危险，因为它可能会停在 2 字节字符的中间，从而导致您收到消息。您应该查看 substrc() 的文档，它将根据字符而不是字节来计算其长度。

It might help if you explain more why you are required to throw away part of the data.

如果您更多地解释为什么需要丢弃部分数据，这可能会有所帮助。

SQL ORA-29275: 部分多字节字符

提问by Walker

回答by Vincent Malgrat

Edit:

编辑：

回答by Stefan U7

回答by Jim Garrison

相关推荐

最近更新

标签

SQL ORA-29275: 部分多字节字符

提问by Walker

回答by Vincent Malgrat

Edit:

编辑：

回答by Stefan U7

回答by Jim Garrison

相关推荐

在 Access/VBA 中构建 SQL 字符串

SQL 主键和代理键有什么区别？

SQL postgresql nextval()

从 SQL Server 2008 生成 XML 文件

相关推荐

最近更新

标签