JavaScript Unicode 规范化

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7772553/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 01:15:40  来源:igfitidea点击:

JavaScript Unicode normalization

javascriptunicodenormalizationunicode-normalization

提问by Matty

I'm under the impression that JavaScript interpreter assumes that the source code it is interpreting has already been normalized. What, exactly does the normalizing? It can't be the text editor, otherwise the plaintext representation of the source would change. Is there some "preprocessor" that does the normalization?

我的印象是 JavaScript 解释器假定它正在解释的源代码已经标准化。什么,究竟是规范化?它不能是文本编辑器,否则源的纯文本表示会改变。是否有一些“预处理器”可以进行标准化?

回答by bobince

No, there is no Unicode Normalization feature used automatically on—or even available to—JavaScript as per ECMAScript 5. All characters remain unchanged as their original code points, potentially in a non-Normal Form.

不,没有根据 ECMAScript 5 在 JavaScript 上自动使用或什至可用于 JavaScript 的 Unicode 规范化功能。所有字符都保持其原始代码点不变,可能处于非正常形式。

eg try:

例如尝试:

<script type="text/javascript">
    var a= 'café';          // caf\u00E9
    var b= 'cafe?';          // cafe\u0301
    alert(a+' '+a.length);  // café 4
    alert(b+' '+b.length);  // cafe? 5
    alert(a==b);            // false
</script>

Update:ECMAScript 6 will introduce Unicode normalization for JavaScript strings.

更新:ECMAScript 6 将为 JavaScript 字符串引入 Unicode 规范化。

回答by Mathias Bynens

ECMAScript 6 introduces String.prototype.normalize()which takes care of Unicode normalization for you.

ECMAScript 6 引入了String.prototype.normalize()它为您处理 Unicode 规范化。

unormis a JavaScript polyfill for this method, so that you can already use String.prototype.normalize()today even though not a single engine supports it natively at the moment.

unorm是此方法的 JavaScript polyfill,因此String.prototype.normalize()即使目前没有一个引擎本身支持它,您今天也可以使用它。

For more information on how and when to use Unicode normalization in JavaScript, see JavaScript has a Unicode problem– Accounting for lookalikes.

有关如何以及何时在 JavaScript 中使用 Unicode 规范化的更多信息,请参阅JavaScript 存在 Unicode 问题– 考虑相似性

回答by Eonil

If you're using node.js, there is a unormlibrary for this.

如果您正在使用node.js,则有一个unorm用于此的库。

https://github.com/walling/unorm

https://github.com/walling/unorm

回答by Lonely

I've updated @bobince 's answer:

我已经更新了@bobince 的回答:

var cafe4= 'caf\u00E9';
var cafe5= 'cafe\u0301';


console.log (
  cafe4+' '+cafe4.length,                  // café 4
  cafe5+' '+cafe5.length,                  // cafe? 5
  cafe4 === cafe5,                         // false
  cafe4.normalize() === cafe5.normalize()  // true
);