Nodejs 将字符串转换为 UTF-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20174280/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 16:20:35  来源:igfitidea点击:

Nodejs convert string into UTF-8

node.jsutf-8

提问by Alosyius

From my DB im getting the following string:

从我的数据库我得到以下字符串:

Johan ?–bert

What it should say is:

它应该说的是:

Johan ?bert

I've tried to convert it into utf-8 like so:

我尝试将其转换为 utf-8,如下所示:

nameString.toString("utf8");

But still same problem.

但还是同样的问题。

Any ideas?

有任何想法吗?

回答by Jayram

Use the utf8module from npm to encode/decode the string.

使用npm 中的utf8模块对字符串进行编码/解码。

Installation:

安装:

npm install utf8

In a browser:

在浏览器中:

<script src="utf8.js"></script>

In Node.js:

在 Node.js 中:

const utf8 = require('utf8');

API:

应用程序接口:

Encode:

编码:

utf8.encode(string)

Encodes any given JavaScript string (string) as UTF-8, and returns the UTF-8-encoded version of the string. It throws an error if the input string contains a non-scalar value, i.e. a lone surrogate. (If you need to be able to encode non-scalar values as well, use WTF-8 instead.)

将任何给定的 JavaScript 字符串(字符串)编码为 UTF-8,并返回字符串的 UTF-8 编码版本。如果输入字符串包含一个非标量值,即一个单独的代理,它会抛出一个错误。(如果您还需要能够对非标量值进行编码,请改用 WTF-8。)

// U+00A9 COPYRIGHT SIGN; see http://codepoints.net/U+00A9
utf8.encode('\xA9');
// → '\xC2\xA9'
// U+10001 LINEAR B SYLLABLE B038 E; see http://codepoints.net/U+10001
utf8.encode('\uD800\uDC01');
// → '\xF0\x90\x80\x81'

Decode:

解码:

utf8.decode(byteString)

Decodes any given UTF-8-encoded string (byteString) as UTF-8, and returns the UTF-8-decoded version of the string. It throws an error when malformed UTF-8 is detected. (If you need to be able to decode encoded non-scalar values as well, use WTF-8 instead.)

将任何给定的 UTF-8 编码字符串 (byteString) 解码为 UTF-8,并返回字符串的 UTF-8 解码版本。当检测到格式错误的 UTF-8 时,它会引发错误。(如果您还需要能够解码编码的非标量值,请改用 WTF-8。)

utf8.decode('\xC2\xA9');
// → '\xA9'

utf8.decode('\xF0\x90\x80\x81');
// → '\uD800\uDC01'
// → U+10001 LINEAR B SYLLABLE B038 E

Resources

资源

回答by Tobias Nickel

I had the same problem, when i loaded a text file via fs.readFile(), I tried to set the encodeing to UTF8, it keeped the same. my solution now is this:

我遇到了同样的问题,当我通过 加载文本文件时fs.readFile(),我尝试将编码设置为 UTF8,它保持不变。我现在的解决方案是这样的:

myString = JSON.parse( JSON.stringify( myString ) )

after this an ? is realy interpreted as an ?.

在这之后?真的被解释为 ?。

回答by Reinstate Monica

You can also use the Bufferclass:

您还可以使用Buffer类:

var someEncodedString = Buffer.from('someString', 'utf-8');

回答by paaat

When you want to change the encoding you always go from one into another. So you might go from Mac Romanto UTF-8or from ASCIIto UTF-8.

当您想更改编码时,您总是会从一种编码转换为另一种编码。所以你可以从Mac RomantoUTF-8或 from ASCIIto UTF-8

It's as important to know the desired output encoding as the current source encoding. For example if you have Mac Romanand you decode it from UTF-16to UTF-8you'll just make it garbled.

了解所需的输出编码与了解当前的源编码一样重要。例如,如果你有Mac Roman并且你将它解码UTF-16UTF-8你只会让它变得乱码。

If you want to know more about encoding this article goes into a lot of details:

如果您想了解有关编码的更多信息,本文将详细介绍:

What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

每个程序员绝对需要了解的有关编码和字符集的知识以处理文本

The npm pacakge encodingwhich uses node-iconvor iconv-liteshould allow you to easily specify which source and output encoding you want:

使用node-iconviconv-lite的 npm pacakge编码应该允许您轻松指定所需的源和输出编码:

var resultBuffer = encoding.convert(nameString, 'ASCII', 'UTF-8');