使用 Oracle SQL Loader (sqlldr) 加载 Unicode 字符会导致问号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8405956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 03:45:32  来源:igfitidea点击:

Loading Unicode Characters with Oracle SQL Loader (sqlldr) results in question marks

oracleunicodecsvsql-loader

提问by philrabin

I'm trying to load localized strings from a unicode (UTF8-encoded) csv using SQL Loader into an oracle database. I've tried all sort of combinations but nothing seems to give me the result I'm looking for which is to have special greek characters like (Δ) not get converted to ?” or ?.

我正在尝试使用 SQL Loader 将本地化字符串从 unicode(UTF8 编码)csv 加载到 oracle 数据库中。我已经尝试了各种组合,但似乎没有什么可以给我我正在寻找的结果,即具有特殊的希腊字符,如 (Δ) 不会被转换为 ?” 或者 ?。

My table definition looks like this:

我的表定义如下所示:

CREATE TABLE "GLOBALIZATIONRESOURCE"
(
    "RESOURCETYPE" VARCHAR2(255 CHAR) NOT NULL ENABLE,
    "CULTURE"      VARCHAR2(20 CHAR) NOT NULL ENABLE,
    "KEY"          VARCHAR2(128 CHAR) NOT NULL ENABLE,
    "VALUE"        VARCHAR2(2048 CHAR),
    "DESCRIPTION"  VARCHAR2(512 CHAR),
    CONSTRAINT "PK_GLOBALIZATIONRESOURCE" PRIMARY KEY ("RESOURCETYPE","CULTURE","KEY") USING INDEX TABLESPACE REPSPACE_IX ENABLE
)
TABLESPACE REPSPACE; 

I have tried the following configurations in my control file (and actually every permutation I could think of)

我在我的控制文件中尝试了以下配置(实际上我能想到的每个排列)

load data
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(   
    "RESOURCETYPE" CHAR(255), 
    "CULTURE" CHAR(20), 
    "KEY" CHAR(128), 
    "VALUE" CHAR(2048), 
    "DESCRIPTION" CHAR(512)
)


load data
CHARACTERSET UTF8
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(   
    "RESOURCETYPE" CHAR(255), 
    "CULTURE" CHAR(20), 
    "KEY" CHAR(128), 
    "VALUE" CHAR(2048), 
    "DESCRIPTION" CHAR(512)
)


load data
CHARACTERSET UTF16
TRUNCATE
INTO TABLE "GLOBALIZATIONRESOURCE"
FIELDS TERMINATED BY X'002c' OPTIONALLY ENCLOSED BY X'0022'
TRAILING NULLCOLS
(   
    "RESOURCETYPE" CHAR(255), 
    "CULTURE" CHAR(20), 
    "KEY" CHAR(128), 
    "VALUE" CHAR(2048), 
    "DESCRIPTION" CHAR(512)
)

With the first two options, the unicode characters don't get encoded and just show up as upside down question marks.

使用前两个选项,unicode 字符不会被编码,只会显示为倒置的问号。

If I choose last option, UTF16, then I get the following error even though all my data in my fields are much shorter than the length specified.

如果我选择最后一个选项 UTF16,那么即使我的字段中的所有数据都比指定的长度短得多,我也会收到以下错误。

Field in data file exceeds maximum length

It seems as though every possible combination of ctl file configurations (even setting the byte order to little and big) doesn't work correctly. Can someone please give an example of a configuration (table structure and CTL file) that correctly loads unicode data from a csv? Any help would be greatly appreciated.

似乎所有可能的 ctl 文件配置组合(甚至将字节顺序设置为小和大)都无法正常工作。有人可以举例说明从 csv 正确加载 unicode 数据的配置(表结构和 CTL 文件)吗?任何帮助将不胜感激。

Note: I've already been to http://docs.oracle.com/cd/B19306_01/server.102/b14215/ldr_concepts.htm, http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htmand http://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htm.

注意:我已经去过http://docs.oracle.com/cd/B19306_01/server.102/b14215/ldr_concepts.htmhttp://docs.oracle.com/cd/B10501_01/server.920/ a96652/ch10.htmhttp://docs.oracle.com/cd/B10501_01/server.920/a96652/ch10.htm

回答by ridonekorkmaz

You have two problem;

你有两个问题;

  1. Character set.
  1. 字符集。

Answer:You can solve this problem by finding your text character set (most of time notepad++ can do this.). After finding character set, you have to find sqlldr correspond of character set name. So, you can find this info from link https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313After all of these, you should solve character set problem.

答:您可以通过查找您的文本字符集来解决此问题(大多数情况下,notepad++ 可以做到这一点。)。找到字符集后,你必须找到sqlldr 对应的字符集名称。因此,您可以从链接https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313 中找到此信息。 完成所有这些之后,您应该解决字符集问题。

  1. In contrast to your actual data length, sqlldr says that, Field in data file exceeds maximum length.
  1. 与您的实际数据长度相反,sqlldr 说,Field in data file exceeds maximum length.

Answer:You can solve this problem by adding CHAR(4000)(or what the actual length is) to problematic column. In my case, the problematic column is "E" column. Example is below. In my case I solved my problem in this way, hope helps. LOAD DATA CHARACTERSET UTF8 -- This line is comment-- Turkish charset (for ü??? etc.) -- CHARACTERSET WE8ISO8859P9 -- Character list is here. -- https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313INFILE 'data.txt' "STR '~|~\n'" TRUNCATE INTO TABLE SILTAB FIELDS TERMINATED BY '#' TRAILING NULLCOLS ( a, b, c, d, e CHAR(4000) )

答:您可以通过向CHAR(4000)有问题的列添加(或实际长度是多少)来解决此问题。就我而言,有问题的列是“E”列。示例如下。就我而言,我以这种方式解决了我的问题,希望有所帮助。 LOAD DATA CHARACTERSET UTF8 -- This line is comment-- Turkish charset (for ü??? etc.) -- CHARACTERSET WE8ISO8859P9 -- Character list is here. -- https://docs.oracle.com/cd/B10501_01/server.920/a96529/appa.htm#975313INFILE 'data.txt' "STR '~|~\n'" TRUNCATE INTO TABLE SILTAB FIELDS TERMINATED BY '#' TRAILING NULLCOLS ( a, b, c, d, e CHAR(4000) )

回答by davidsr

You must ensure that the following charactersets are the same:

您必须确保以下字符集相同:

  1. db characterset
  2. dump file characterset
  3. the client from which you are doing the import (NLS_LANG)
  1. 数据库字符集
  2. 转储文件字符集
  3. 您要从中进行导入的客户端 (NLS_LANG)

If the client-side characterset is different, oracle will attempt to perform character conversions to the native db characterset and this might not always provide the desired result.

如果客户端字符集不同,oracle 将尝试执行到本机 db 字符集的字符转换,这可能无法始终提供所需的结果。

回答by user1019903

Don't use MS Office to save the spreadsheet into unicode .csv. Instead, use OpenOffice to save into unicode-UTF8 .csv file. Then in the loader control file, add "CHARACTERSET UTF8" run Oracle SQL*Loader, this gives me correct results

不要使用 MS Office 将电子表格保存为 unicode .csv。相反,使用 OpenOffice 保存到 unicode-UTF8 .csv 文件。然后在加载器控制文件中,添加“CHARACTERSET UTF8”运行 Oracle SQL*Loader,这给了我正确的结果

回答by pavangulhane

There is a range of character set encoding that you can use in control file while loading data from sql loader.

从 sql loader 加载数据时,您可以在控制文件中使用一系列字符集编码。

For greek characters I believe Western European char set should do the trick.

对于希腊字符,我相信西欧字符集应该可以解决问题。

LOAD DATA
CHARACTERSET WE8ISO8859P1

or in case of MS word input files with smart characters try in control file

或者如果是带有智能字符的 MS Word 输入文件,请尝试在控制文件中

LOAD DATA
CHARACTERSET WE8MSWIN1252