Nodejs 将字符串转换为 UTF-8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20174280/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Nodejs convert string into UTF-8
提问by Alosyius
From my DB im getting the following string:
从我的数据库我得到以下字符串:
Johan ?–bert
What it should say is:
它应该说的是:
Johan ?bert
I've tried to convert it into utf-8 like so:
我尝试将其转换为 utf-8,如下所示:
nameString.toString("utf8");
But still same problem.
但还是同样的问题。
Any ideas?
有任何想法吗?
回答by Jayram
Use the utf8module from npm to encode/decode the string.
使用npm 中的utf8模块对字符串进行编码/解码。
Installation:
安装:
npm install utf8
In a browser:
在浏览器中:
<script src="utf8.js"></script>
In Node.js:
在 Node.js 中:
const utf8 = require('utf8');
API:
应用程序接口:
Encode:
编码:
utf8.encode(string)
Encodes any given JavaScript string (string) as UTF-8, and returns the UTF-8-encoded version of the string. It throws an error if the input string contains a non-scalar value, i.e. a lone surrogate. (If you need to be able to encode non-scalar values as well, use WTF-8 instead.)
将任何给定的 JavaScript 字符串(字符串)编码为 UTF-8,并返回字符串的 UTF-8 编码版本。如果输入字符串包含一个非标量值,即一个单独的代理,它会抛出一个错误。(如果您还需要能够对非标量值进行编码,请改用 WTF-8。)
// U+00A9 COPYRIGHT SIGN; see http://codepoints.net/U+00A9
utf8.encode('\xA9');
// → '\xC2\xA9'
// U+10001 LINEAR B SYLLABLE B038 E; see http://codepoints.net/U+10001
utf8.encode('\uD800\uDC01');
// → '\xF0\x90\x80\x81'
Decode:
解码:
utf8.decode(byteString)
Decodes any given UTF-8-encoded string (byteString) as UTF-8, and returns the UTF-8-decoded version of the string. It throws an error when malformed UTF-8 is detected. (If you need to be able to decode encoded non-scalar values as well, use WTF-8 instead.)
将任何给定的 UTF-8 编码字符串 (byteString) 解码为 UTF-8,并返回字符串的 UTF-8 解码版本。当检测到格式错误的 UTF-8 时,它会引发错误。(如果您还需要能够解码编码的非标量值,请改用 WTF-8。)
utf8.decode('\xC2\xA9');
// → '\xA9'
utf8.decode('\xF0\x90\x80\x81');
// → '\uD800\uDC01'
// → U+10001 LINEAR B SYLLABLE B038 E
回答by Tobias Nickel
I had the same problem, when i loaded a text file via fs.readFile(), I tried to set the encodeing to UTF8, it keeped the same. my solution now is this:
我遇到了同样的问题,当我通过 加载文本文件时fs.readFile(),我尝试将编码设置为 UTF8,它保持不变。我现在的解决方案是这样的:
myString = JSON.parse( JSON.stringify( myString ) )
after this an ? is realy interpreted as an ?.
在这之后?真的被解释为 ?。
回答by Reinstate Monica
回答by paaat
When you want to change the encoding you always go from one into another. So you might go from Mac Romanto UTF-8or from ASCIIto UTF-8.
当您想更改编码时,您总是会从一种编码转换为另一种编码。所以你可以从Mac RomantoUTF-8或 from ASCIIto UTF-8。
It's as important to know the desired output encoding as the current source encoding. For example if you have Mac Romanand you decode it from UTF-16to UTF-8you'll just make it garbled.
了解所需的输出编码与了解当前的源编码一样重要。例如,如果你有Mac Roman并且你将它解码UTF-16为UTF-8你只会让它变得乱码。
If you want to know more about encoding this article goes into a lot of details:
如果您想了解有关编码的更多信息,本文将详细介绍:
The npm pacakge encodingwhich uses node-iconvor iconv-liteshould allow you to easily specify which source and output encoding you want:
使用node-iconv或iconv-lite的 npm pacakge编码应该允许您轻松指定所需的源和输出编码:
var resultBuffer = encoding.convert(nameString, 'ASCII', 'UTF-8');

