如何比较 Javascript 中的 Unicode 字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3630645/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to compare Unicode strings in Javascript?
提问by Tomasz Wysocki
When I wrote in JavaScript "?" > "Z"it returns true. In Unicode order it should be of course false. How to fix this? My site is using UTF-8.
当我用 JavaScript 编写时,"?" > "Z"它返回true. 在 Unicode 顺序中,它当然应该是false. 如何解决这个问题?我的网站使用 UTF-8。
采纳答案by Oriol
You can use Intl.Collatoror String.prototype.localeCompare, introduced by ECMAScript Internationalization API:
您可以使用ECMAScript 国际化 API引入的Intl.Collator或:String.prototype.localeCompare
"?".localeCompare("Z", "pl"); // -1
new Intl.Collator("pl").compare("?","Z"); // -1
-1means that ?comes before Z, like you want.
-1意味着它?在之前Z,就像你想要的那样。
Note it only works on latest browsers, though.
请注意,它仅适用于最新的浏览器。
回答by Mic
Here is an example for the french alphabet that could help you for a custom sort:
这是法语字母表的示例,可以帮助您进行自定义排序:
var alpha = function(alphabet, dir, caseSensitive){
return function(a, b){
var pos = 0,
min = Math.min(a.length, b.length);
dir = dir || 1;
caseSensitive = caseSensitive || false;
if(!caseSensitive){
a = a.toLowerCase();
b = b.toLowerCase();
}
while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
return alphabet.indexOf(a.charAt(pos)) > alphabet.indexOf(b.charAt(pos)) ?
dir:-dir;
};
};
To use it on an array of strings a:
要在字符串数组上使用它a:
a.sort(
alpha('ABCDEFGHIJKLMNOPQRSTUVWXYZaàa?bc?deéèê?fghi??jklmn?o??pqrstu?üvwxy?z')
);
Add 1or -1as the second parameter of alpha()to sort ascending or descending.
Add trueas the 3rd parameter to sort case sensitive.
添加1或-1作为第二个参数alpha()以升序或降序排序。
添加true为第三个参数以区分大小写。
You may need to add numbers and special chars to the alphabet list
您可能需要在字母表中添加数字和特殊字符
回答by Pekka
You may be able to build your own sorting function using localeCompare()that - at least according to the MDC article on the topic- should sort things correctly.
您可以使用localeCompare()它来构建自己的排序功能- 至少根据关于该主题的MDC 文章- 应该正确排序。
If that doesn't work out, here is an interesting SO questionwhere the OP employs string replacement to build a "brute-force" sorting mechanism.
如果这不起作用,这里有一个有趣的 SO 问题,其中 OP 使用字符串替换来构建“蛮力”排序机制。
Also in that question, the OP shows how to build a custom textExtractfunctionfor the jQuery tablesorter plugin that does locale-aware sorting - maybe also worth a look.
同样在那个问题中,OP 展示了如何为 jQuery tablesorter 插件构建一个自定义textExtract函数,该插件进行区域设置感知排序 - 也许也值得一看。
Edit:As a totally far-out idea - I have no idea whether this is feasible at all, especially because of performance concerns - if you are working with PHP/mySQL on the back-end anyway, I would like to mention the possibility of sending an Ajax query to a mySQL instance to have it sorted there. mySQL is great at sorting locale aware data, because you can force sorting operations into a specific collation using e.g. ORDER BY xyz COLLATE utf8_polish_ci, COLLATE utf8_german_ci.... those collations would take care of all sorting woes at once.
编辑:作为一个完全遥不可及的想法 - 我根本不知道这是否可行,尤其是出于性能方面的考虑 - 如果您无论如何都在后端使用 PHP/mySQL,我想提一下将 Ajax 查询发送到 mySQL 实例以在那里对其进行排序。mySQL 非常擅长对区域设置感知数据进行排序,因为您可以使用例如ORDER BY xyz COLLATE utf8_polish_ci, COLLATE utf8_german_ci... 将排序操作强制转换为特定的排序规则。这些排序规则将立即处理所有排序问题。
回答by Tomasz Wysocki
Mic's code improved for non-mentioned chars:
针对未提及的字符改进了麦克风的代码:
var alpha = function(alphabet, dir, caseSensitive){
dir = dir || 1;
function compareLetters(a, b) {
var ia = alphabet.indexOf(a);
var ib = alphabet.indexOf(b);
if(ia === -1 || ib === -1) {
if(ib !== -1)
return a > 'a';
if(ia !== -1)
return 'a' > b;
return a > b;
}
return ia > ib;
}
return function(a, b){
var pos = 0;
var min = Math.min(a.length, b.length);
caseSensitive = caseSensitive || false;
if(!caseSensitive){
a = a.toLowerCase();
b = b.toLowerCase();
}
while(a.charAt(pos) === b.charAt(pos) && pos < min){ pos++; }
return compareLetters(a.charAt(pos), b.charAt(pos)) ? dir:-dir;
};
};
function assert(bCondition, sErrorMessage) {
if (!bCondition) {
throw new Error(sErrorMessage);
}
}
assert(alpha("bac")("a", "b") === 1, "b is first than a");
assert(alpha("abc")("ac", "a") === 1, "shorter string is first than longer string");
assert(alpha("abc")("1abc", "0abc") === 1, "non-mentioned chars are compared as normal");
assert(alpha("abc")("0abc", "1abc") === -1, "non-mentioned chars are compared as normal [2]");
assert(alpha("abc")("0abc", "bbc") === -1, "non-mentioned chars are compared with mentioned chars in special way");
assert(alpha("abc")("zabc", "abc") === 1, "non-mentioned chars are compared with mentioned chars in special way [2]");
回答by xandru
You have to keep two sortkey strings. One is for primary order, where German ?=a (primary a->a) and French é=e (primary sortkey e->e) and one for secondary order, where ? comes after a (translating a->azzzz in secondary key) or é comes after e (secondary key e->ezzzz). Especially in Czech some letters are variations of a letter (áéí…) whereas others stand in their full right in the list (ABC?D…GHChI…R?S?T…). Plus the problem to consider digraphs a single letters (primary ch->hzzzz). No trivial problem, and there should be a solution within JS.
您必须保留两个 sortkey 字符串。一种用于一级订单,其中德语 ?=a(一级 a->a)和法语 é=e(一级排序键 e->e),另一种用于二级订单,其中 ? 在 a 之后(在辅助键中翻译 a->azzzz)或 é 在 e 之后(辅助键 e->ezzzz)。尤其是在捷克语中,一些字母是字母 (áéí...) 的变体,而其他字母则在列表中完全正确 (ABC?D...GHChI...R?S?T...)。加上考虑有向图单个字母的问题(主要 ch->hzzzz)。不是小问题,JS内部应该有解决方案。
回答by pid
Funny, I have to think about that problem and finished searching here, because it came in mind, that I can use my own javascript module. I wrote a module to generate a clean URL, therefor I have to translitate the input string... (http://pid.github.io/speakingurl/)
有趣的是,我不得不考虑这个问题并在这里完成搜索,因为我想到了,我可以使用我自己的 javascript 模块。我写了一个模块来生成一个干净的 URL,因此我必须翻译输入字符串......(http://pid.github.io/speakingurl/)
var mySlug = require('speakingurl').createSlug({
maintainCase: true,
separator: " "
});
var input = "Sch?ner Titel l??t grü?en!? Bel été !";
var result;
slug = mySlug(input);
console.log(result); // Output: "Schoener Titel laesst gruessen bel ete"
Now you can sort with this results. You can ex. store the original titel in the field "title" and the field for sorting in "title_sort" with the result of mySlug.
现在您可以使用此结果进行排序。你可以前。将原始标题存储在“title”字段中,将用于排序的字段与 mySlug 的结果存储在“title_sort”中。

