oracle UTL_FILE 和字符集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22040903/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 05:31:21  来源:igfitidea点击:

UTL_FILE and character set

oracleplsqlcharacter-encoding

提问by DeadlyJesus

I've been working on this thing for days and it's driving me crazy.
I have an oracle procedure that write a file using UTL_FILE. I used to store my values as NVARCHAR2 and write my file using UTL_FILE.PUT_LINE_NCHARprocedure, and it wrote file in (what notepad++ consider as) UTF8.
The file is then used by another program, the problem is that said program read it using WE8MSWIN1252, and I can't change that, since it's legacy code.
So I tried to use UTL_FILE.PUT_LINEprocedure instead, but the file was still considered as UTF8. I saw in oracle's documentation that NVARCHAR2used the national character set (mine is AL16UTF16), so I tried to use the CONVERTmethod like this:

我已经研究这件事好几天了,这让我发疯了。
我有一个使用UTL_FILE. 我曾经将我的值存储为 NVARCHAR2 并使用UTL_FILE.PUT_LINE_NCHAR过程写入我的文件,并且它以(记事本 ++ 认为是)UTF8 写入文件。
然后该文件被另一个程序使用,问题是该程序使用 WE8MSWIN1252 读取它,我无法更改它,因为它是遗留代码。
所以我尝试改用UTL_FILE.PUT_LINE程序,但该文件仍被视为UTF8。我在oracle的文档中看到NVARCHAR2使用了国家字符集(我的是AL16UTF16),所以我尝试使用这样的CONVERT方法:

CONVERT(whatIWantToWrite, 'WE8MSWIN1252', 'AL16UTF16'))

and it raised the ORA-29298 Character set mismatch Exception. I don't get it, my NLS_NCHAR_CHARACTERSETis AL16UTF16 why can't I convert it to WE8MSWIN1252 ?
Is there another way to write a file using WE8MSWIN1252 ?

它引发了 ORA-29298 字符集不匹配异常。我不明白,我的NLS_NCHAR_CHARACTERSET是 AL16UTF16 为什么我不能将它转换为 WE8MSWIN1252 ?
还有另一种使用 WE8MSWIN1252 写入文件的方法吗?

回答by Alex Poole

This seems to be because you're still opening the file with fopen_nchar. If I do this:

这似乎是因为您仍在使用fopen_nchar. 如果我这样做:

create table t42(str nvarchar2(20));
insert into t42 values ('Hello');

declare
  file utl_file.file_type;
  l_str nvarchar2(20);
begin
  select str into l_str from t42;
  file := utl_file.fopen('<directory>', 'dummy.dat', 'w', 32767);
  utl_file.put_line(file, convert(l_str, 'WE8MSWIN1252', 'AL16UTF16'));
  utl_file.fclose(file);
end;
/

... then I get a file containing ??¥?±?, which the Linux filecommand reports as UTF-8 Unicode text; Notepad++ shows ?汬and says the file is 'ANSI as UTF-8'.

...然后我得到一个包含 的文件??¥?±?,Linuxfile命令将其报告为UTF-8 Unicode text; Notepad++ 显示?汬并说该文件是“ANSI as UTF-8”。

If I change the fopento fopen_nchar:

如果我更改fopenfopen_nchar

  file := utl_file.fopen_nchar('CENSYS_EXPORT_DIR', 'dummy.dat', 'w', 32767);

... then I get ORA-29298: Character set mismatchand an empty file.

...然后我得到ORA-29298: Character set mismatch一个空文件。

If I go back to fopenbut change the PL/SQL variable to varchar2:

如果我返回fopen但将 PL/SQL 变量更改为varchar2

declare
  file utl_file.file_type;
  l_str varchar2(20);
begin
  select str into l_str from t42;
  file := utl_file.fopen('<directory>', 'dummy.dat', 'w', 32767);
  utl_file.put_line(file, convert(l_str, 'WE8MSWIN1252', 'AL16UTF16'));
  utl_file.fclose(file);
end;
/

... then the file contains ????(in vim) and the file is reported as ISO-8859 text. But Notepad++ shows ?and says the file is ANSI.

...然后文件包含????(in vim) 并且文件被报告为ISO-8859 text. 但是 Notepad++ 显示?并说该文件是 ANSI。

Rather than using convert, which Oracle discourages, you can bounce it through raw:

convertOracle不鼓励使用,而不是使用,您可以通过 raw 反弹它:

declare
  file utl_file.file_type;
  l_str varchar2(20);
begin
  select str into l_str from t42;
  file := utl_file.fopen('<directory>', 'dummy.dat', 'w', 32767);
  utl_file.put_line(file,
    utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw(l_str),
      'ENGLISH_UNITED KINGDOM.WE8MSWIN1252', 'ENGLISH_UNITED KINGDOM.UTF8')));
  utl_file.fclose(file);
end;
/

In Linux that shows as Helloand the file is reported as ASCII text; Notepad++ shows it as Helloas well, and again says the file is ANSI. I'm unclear if that gets you where you need to be... and you might need a different language and locale, of course.

在 Linux 中显示为Hello,文件报告为ASCII text; Notepad++ 也显示它Hello,并再次说该文件是 ANSI。我不清楚这是否能让你到达你需要的地方……当然,你可能需要不同的语言和语言环境。

But my database character set is AL32UTF8, and my national character set is AL16UTF16, so you might see different behaviour; if your database character set is WE8MSWIN1252 then the file will be created as that as well; from the documentation:

但是我的数据库字符集是 AL32UTF8,我的国家字符集是 AL16UTF16,所以你可能会看到不同的行为;如果您的数据库字符集是 WE8MSWIN1252,那么该文件也将被创建;从文档

UTL_FILEexpects that files opened by UTL_FILE.FOPENin text mode are encoded in the database character set. It expects that files opened by UTL_FILE.FOPEN_NCHARin text mode are encoded in the UTF8 character set.

UTL_FILE期望UTL_FILE.FOPEN以文本模式打开的文件以数据库字符集编码。它期望UTL_FILE.FOPEN_NCHAR以文本模式打开的文件以 UTF8 字符集编码。

回答by Wernfried Domscheit

Maybe it is an option for you to convert the file afterwards it has been written to disc, e.g. with Java tool Native-to-ASCIIConverter.

也许您可以选择在将文件写入光盘后对其进行转换,例如使用 Java 工具Native-to-ASCIIConverter。

native2ascii -encoding UTF8 my_text_file_utf.txt my_text_file.tmp
native2ascii -reverse -encoding windows-1252 my_text_file.tmp my_text_file_1252.txt

回答by Sylwek

You can use dbms_xslprocessor.clob2file.

您可以使用dbms_xslprocessor.clob2file.

declare
  l_str varchar2(20);
BEGIN
  select str into l_str from t42;
  dbms_xslprocessor.clob2file(to_clob(l_str), 'UTLDIR', 'file.txt', 2000);
END;

AL16UTF16(csid)=2000 WE8MSWIN1252(csid)=178 To get CSID:

AL16UTF16(csid)=2000 WE8MSWIN1252(csid)=178 得到CSID

SELECT NLS_CHARSET_ID('WE8MSWIN1252') FROM DUAL;