bash Linux 中从 EBCDIC 到 UTF8 的转换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36496008/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Conversion from EBCDIC to UTF8 in Linux
提问by luca76
I have imported with Perl a table from our database AS/400 DB2.
我用 Perl 从我们的数据库 AS/400 DB2 中导入了一个表。
The problem is that the string are encoded in EBCDIC Latin-1 (italian language).
问题是字符串是用 EBCDIC Latin-1(意大利语)编码的。
How can I convert the resulting file to plain utf-8 in Linux bash?
如何在 Linux bash 中将生成的文件转换为纯 utf-8?
采纳答案by luca76
It's simple with iconv
.
很简单iconv
。
iconv -f ISO8859-1 -t "UTF-8" result.csv -o new_result.csv
ISO8859-1 is the Latin-1 encoding format. For a list of encodings, refer t this table from official IBM documentation: https://www.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.nls/doc/nlsgdrf/iconv.htm%23d722e3a267mela
ISO8859-1 是 Latin-1 编码格式。有关编码列表,请参阅 IBM 官方文档中的此表:https: //www.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.nls/doc/nlsgdrf/iconv.htm%23d722e3a267mela
Note that the conversion may leave non valid UTF-8 characters from EBCDIC. An example are NULL characters in the strings. To avoid this, use an HEX editor and replace hex values from 00 to 20 (space character).
请注意,转换可能会从 EBCDIC 中留下无效的 UTF-8 字符。一个例子是字符串中的 NULL 字符。为避免这种情况,请使用十六进制编辑器并将十六进制值替换为 00 到 20(空格字符)。
回答by DevSolar
Start with
从...开始
iconv -f EBCDIC-IT -t utf-8 <filename>
then check the output, and if it isn't exactly correct, check man iconv
and the available encodings listed by iconv -l
.
然后检查输出,如果它不完全正确,请检查man iconv
和列出的可用编码iconv -l
。
(Note that "EBCDIC Latin-1" is somewhat strange. "Latin-1" indicates ISO-8859-1, while "EBCDIC" is something else entirely. Try file <filename>
to get an educated guess by the computer as to what encoding you are actually looking at.)
(请注意,“EBCDIC Latin-1”有点奇怪。“Latin-1”表示 ISO-8859-1,而“EBCDIC”则完全是file <filename>
另一回事。尝试通过计算机对您实际使用的编码进行有根据的猜测看着。)
回答by JayBee
I had good luck with the following line:
我对以下行很幸运:
iconv -f IBM037 -t utf-8 input_ebcdic.txt -o output.txt