如何使用 nodejs-iconv 模块(或其他解决方案)在 nodejs javascript 中将字符编码从 CP932 转换为 UTF-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6411570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert character encoding from CP932 to UTF-8 in nodejs javascript, using the nodejs-iconv module (or other solution)
提问by Brian
I'm attempting to convert a string from CP932 (aka Windows-31J) to utf8 in javascript. Basically I'm crawling a site that ignores the utf-8 request in the request header and returns cp932 encoded text (even though the html metatag indicates that the page is shift_jis).
我正在尝试在 javascript 中将字符串从 CP932(又名 Windows-31J)转换为 utf8。基本上,我正在抓取一个忽略请求标头中的 utf-8 请求并返回 cp932 编码文本的站点(即使 html 元标记指示页面是 shift_jis)。
Anyway, I have the entire page stored in a string variable called "html". From there I'm attempting to convert it to utf8 using this code:
无论如何,我将整个页面存储在一个名为“html”的字符串变量中。从那里我尝试使用以下代码将其转换为 utf8:
var Iconv = require('iconv').Iconv;
var conv = new Iconv('CP932', 'UTF-8//TRANSLIT//IGNORE');
var myBuffer = new Buffer(html.length * 3);
myBuffer.write(html, 0, 'utf8')
var utf8html = (conv.convert(myBuffer)).toString('utf8');
The result is not what it's supposed to be. For example, the string: "投稿者さんの 稚内全日空ホテル のクチコミ (感想?情報)" comes out as "??????e???ゑ?????????? ??t??????S??????????z??e???? ???ク??`??R??~ (??????z??E????????)"
结果不是它应该的样子。例如,字符串:“投稿者さんの稚内全日空ホテルのクチコミ(感想?情报)”输出为“??????e???ゑ?????????? ??t ??????S????????????z??e???? ???ク??`??R??~ (??????z??E? ???????)”
If I remove //TRANSLIT//IGNORE (Which should cause it to return similar characters for missing characters, and failing that omit non-transcode-able characters), I get this error: Error: EILSEQ, Illegal character sequence.
如果我删除 //TRANSLIT//IGNORE(这应该会导致它为丢失的字符返回类似的字符,如果失败则忽略不可转码的字符),我会收到此错误:错误:EILSEQ,非法字符序列。
I'm open to using any solution that can be implemented in nodejs, but my search results haven't yielded many options outside of the nodejs-iconv module.
我愿意使用任何可以在 nodejs 中实现的解决方案,但我的搜索结果并没有在 nodejs-iconv 模块之外产生很多选项。
nodejs-iconv ref: https://github.com/bnoordhuis/node-iconv
nodejs-iconv 参考:https: //github.com/bnoordhuis/node-iconv
Thanks!
谢谢!
Edit 24.06.2011: I've gone ahead and implemented a solution in Java. However I'd still be interested in a javascript solution to this problem if somebody can solve it.
2011 年 6 月 24 日编辑:我已经在 Java 中实现了一个解决方案。但是,如果有人可以解决这个问题,我仍然会对这个问题的 javascript 解决方案感兴趣。
采纳答案by Masatoshi
I got same trouble today :)
It depends libiconv. You need libiconv-1.13-ja-1.patch.
Please check followings.
我今天遇到了同样的麻烦:)
这取决于 libiconv。你需要 libiconv-1.13-ja-1.patch。
请检查以下内容。
or you can avoid problem using iconv-jp try
或者您可以使用 iconv-jp 来避免问题尝试
npm install iconv-jp
回答by horejsek
I had same problem, but with CP1250. I was looking for problem everywhere and everything was OK, except call of request – I had to add encoding: 'binary'
.
我有同样的问题,但使用 CP1250。我到处都在寻找问题,一切都很好,除了请求调用 - 我不得不添加encoding: 'binary'
.
request = require('request')
Iconv = require('iconv').Iconv
request({uri: url, encoding: 'binary'}, function(err, response, body) {
body = new Buffer(body, 'binary')
iconv = new Iconv('CP1250', 'UTF8')
body = iconv.convert(body).toString()
// ...
})
回答by Masatoshi
https://github.com/bnoordhuis/node-iconv/issues/19
https://github.com/bnoordhuis/node-iconv/issues/19
I tried /Users/Me/node_modules/iconv/test.js node test.js. It return error.
我试过/Users/Me/node_modules/iconv/test.js node test.js。它返回错误。
On Mac OS X Lion, this problem seems depend on gcc.
在 Mac OS X Lion 上,这个问题似乎取决于 gcc。