在 JavaScript 中表达 UTF-16 Unicode 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7126384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 23:02:45  来源:igfitidea点击:

Expressing UTF-16 unicode characters in JavaScript

javascriptunicode

提问by gilly3

To express, for example, the character U+10400 in JavaScript, I use "\uD801\uDC00"or String.fromCharCode(0xD801) + String.fromCharCode(0xDC00). How do I figure that out for a given unicode character? I want the following:

例如,为了表达 JavaScript 中的字符 U+10400,我使用"\uD801\uDC00"String.fromCharCode(0xD801) + String.fromCharCode(0xDC00)。我如何计算给定的 unicode 字符?我想要以下内容:

var char = getUnicodeCharacter(0x10400);

How do I find 0xD801and 0xDC00from 0x10400?

我如何找到0xD8010xDC00来自0x10400

回答by Arnaud Le Blanc

Based on the wikipedia articlegiven by Henning Makholm, the following function will return the correct character for a code point:

根据Henning Makholm 给出的维基百科文章,以下函数将返回代码点的正确字符:

function getUnicodeCharacter(cp) {

    if (cp >= 0 && cp <= 0xD7FF || cp >= 0xE000 && cp <= 0xFFFF) {
        return String.fromCharCode(cp);
    } else if (cp >= 0x10000 && cp <= 0x10FFFF) {

        // we substract 0x10000 from cp to get a 20-bits number
        // in the range 0..0xFFFF
        cp -= 0x10000;

        // we add 0xD800 to the number formed by the first 10 bits
        // to give the first byte
        var first = ((0xffc00 & cp) >> 10) + 0xD800

        // we add 0xDC00 to the number formed by the low 10 bits
        // to give the second byte
        var second = (0x3ff & cp) + 0xDC00;

        return String.fromCharCode(first) + String.fromCharCode(second);
    }
}

回答by Mathias Bynens

How do I find 0xD801and 0xDC00from 0x10400?

我如何找到0xD8010xDC00来自0x10400

JavaScript uses UCS-2 internally.That's why String#charCodeAt()doesn't work the way you'd want it to.

JavaScript 在内部使用 UCS-2。这就是为什么String#charCodeAt()不能按照您希望的方式工作的原因。

If you want to get the code point of every Unicode character (including non-BMP characters) in a string, you could use Punycode.js's utility functions to convert between UCS-2 strings and UTF-16 code points:

如果您想获取字符串中每个 Unicode 字符(包括非 BMP 字符)的代码点,您可以使用Punycode.js的实用函数在 UCS-2 字符串和 UTF-16 代码点之间进行转换:

// String#charCodeAt() replacement that only considers full Unicode characters
punycode.ucs2.decode(''); // [119558]
punycode.ucs2.decode('abc'); // [97, 98, 99]

If you don't need to do it programmatically though, and you've already got the character, just use mothereff.in/js-escapes. It will tell you how to escape any character in JavaScript.

如果您不需要以编程方式执行此操作,并且您已经拥有该角色,则只需使用momeff.in/js-escapes。它会告诉你如何在 JavaScript 中转义任何字符