比较 unicode 字符时,Javascript 字符串比较失败
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10805711/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Javascript string comparison fails when comparing unicode characters
提问by tougher
I want to compare two strings in JavaScript that are the same, and yet the equality operator ==returns false. One string contains a special character (eg. the danish ?).
我想比较 JavaScript 中相同的两个字符串,但相等运算符==返回 false。一个字符串包含一个特殊字符(例如 danish ?)。
JavaScript code:
JavaScript 代码:
var filenameFromJS = "Designh?ndbog.pdf";
var filenameFromServer = "Designh?ndbog.pdf";
print(filenameFromJS == filenameFromServer); // This prints false why?
The solutionWhat worked for me is unicode normalization as slevithan pointed out.
解决方案对我有用的是 slevithan 指出的 unicode 规范化。
I forked my original jsfiddle to make a version using the normalization lib suggested by slevithan. Link: http://jsfiddle.net/GWZ8j/1/.
我分叉了我原来的 jsfiddle,使用 slevithan 建议的规范化库制作了一个版本。链接:http: //jsfiddle.net/GWZ8j/1/。
采纳答案by slevithan
Unlike what some other people here have said, this has nothing to do with encodings. Rather, your two strings use different code points to render the same visual characters.
与这里其他一些人所说的不同,这与编码无关。相反,您的两个字符串使用不同的代码点来呈现相同的视觉字符。
To solve this correctly, you need to perform Unicode normalization on the two strings before comparing them. Unforunately, JavaScript doesn't have this functionality built in. Here is a JavaScript library that can perform the normalization for you: https://github.com/walling/unorm
为了正确解决这个问题,您需要在比较两个字符串之前对它们执行 Unicode 规范化。不幸的是,JavaScript 没有内置此功能。这是一个可以为您执行规范化的 JavaScript 库:https: //github.com/walling/unorm
回答by Eric Leschinski
The JavaScript equality operator ==will appear to be failing under the following circumstances. In all cases it is programmer error. Not a bug in JavaScript.
==在以下情况下,JavaScript 相等运算符似乎会失败。在所有情况下都是程序员错误。不是 JavaScript 中的错误。
The two strings do not contain the same number and sequence of characters.
There is whitespace or newlines before, within or after one string. Use a trim() operator on both and look closely at both strings.
Surprise typecasting. The programmer is comparing datatypes that are incompatible.
There are unicode characters which look identical to other unicode characters but in fact are different unicode characters.
这两个字符串不包含相同数量和序列的字符。
在一个字符串之前、之中或之后有空格或换行符。在两者上使用 trim() 运算符并仔细查看两个字符串。
惊喜排版。程序员正在比较不兼容的数据类型。
有些 unicode 字符看起来与其他 unicode 字符相同,但实际上是不同的 unicode 字符。
回答by user2428118
UTF-8 is a complex thing. The charset has two different codes for characters such as á, é etc. As you already see in the URL encoded version, the HEX bytes of which the character is made differ for both versions.
UTF-8 是一个复杂的东西。字符集有两种不同的字符代码,例如 á、é 等。正如您在 URL 编码版本中看到的那样,两个版本的字符的 HEX 字节不同。
See thisanswer for more information.
有关更多信息,请参阅此答案。
回答by Daniel F
I had this same problem.
我有同样的问题。
Adding
添加
<meta charset="UTF-8">
to the HTML file fixed the issue.
到 HTML 文件修复了这个问题。
In my case the templating engine was baking a json string into the HTML file. This string was in unicode.
在我的例子中,模板引擎将一个 json 字符串烘焙到 HTML 文件中。这个字符串是unicode的。
While the template was also a unicode file, the JS engine was treating the string I wrote into the template as a latin-1 encoded string, until I added the meta tag.
虽然模板也是一个 unicode 文件,但 JS 引擎将我写入模板的字符串视为 latin-1 编码的字符串,直到我添加了元标记。
I was comparing the typed in string to one of the JSON objects items (location.title == "Mühle")
我正在将输入的字符串与 JSON 对象项目之一进行比较 ( location.title == "Mühle")

