使用 JavaScript 执行带/不带重音字符的文本匹配
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5700636/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using JavaScript to perform text matches with/without accented characters
提问by Philip
I am using an AJAX-based lookup for names that a user searches in a text box.
我正在使用基于 AJAX 的查找来查找用户在文本框中搜索的名称。
I am making the assumption that all names in the database will be transliterated to European alphabets (i.e. no Cyrillic, Japanese, Chinese). However, the names will still contain accented characters, such as ?, ê and even ? and ?.
我假设数据库中的所有名称都将被音译为欧洲字母(即没有西里尔字母、日语、中文)。但是,名称仍将包含重音字符,例如 ?、ê 甚至 ? 和 ?。
A simple search like "Micic" will not match "Mi?i?" though - and the user expectation is that it will.
像“Micic”这样的简单搜索不会匹配“Mi?i?” 虽然 - 用户期望它会。
The AJAX lookup uses regular expressions to determine a match. I have modified the regular expression comparison using this function in an attempt to match more accented characters. However, it's a little clumsy since it doesn't take into account all characters.
AJAX 查找使用正则表达式来确定匹配项。我已使用此函数修改了正则表达式比较,以尝试匹配更多重音字符。但是,它有点笨拙,因为它没有考虑到所有字符。
function makeComp (input)
{
input = input.toLowerCase ();
var output = '';
for (var i = 0; i < input.length; i ++)
{
if (input.charAt (i) == 'a')
output = output + '[aàáa????]'
else if (input.charAt (i) == 'c')
output = output + '[c?]';
else if (input.charAt (i) == 'e')
output = output + '[eèéê??]';
else if (input.charAt (i) == 'i')
output = output + '[iìí??]';
else if (input.charAt (i) == 'n')
output = output + '[n?]';
else if (input.charAt (i) == 'o')
output = output + '[oòó????]';
else if (input.charAt (i) == 's')
output = output + '[s?]';
else if (input.charAt (i) == 'u')
output = output + '[uùú?ü]';
else if (input.charAt (i) == 'y')
output = output + '[y?]'
else
output = output + input.charAt (i);
}
return output;
}
Apart from a substitution function like this, is there a better way? Perhaps to "deaccent" the string being compared?
除了这样的替代函数,还有更好的方法吗?也许是为了“降低”被比较的字符串?
回答by Takit Isy
There is a way to “"deaccent" the string being compared”without the use of a substitution function that lists all the accents you want to remove…
有一种方法可以在不使用列出您要删除的所有重音的替换函数的情况下“对正在比较的字符串进行“减重”……
Here is the easiest solutionI can think about to remove accents (and other diacritics) from a string.
这是我能想到的从字符串中删除重音(和其他变音符号)的最简单的解决方案。
See it in action:
看看它在行动:
var string = "?a été Mi?i?. àé?ó?";
console.log(string);
var string_norm = string.normalize('NFD').replace(/[\u0300-\u036f]/g, "");
console.log(string_norm);
回答by Josh from Qaribou
Came upon this old thread and thought I'd try my hand at doing a fast function. I'm relying on the ordering of pipe-separated ORs setting variables when they match in the function replace() is calling. My goal was to use the standard regex-implementation javascript's replace() function uses as much as possible, so that the heavy-processing can take place in low-level browser-optimized space, instead of in expensive javascript char-by-char comparisons.
遇到这个旧线程,我想我会尝试做一个快速的功能。当它们在调用函数 replace() 中匹配时,我依赖于管道分隔的 OR 设置变量的顺序。我的目标是尽可能多地使用标准正则表达式实现 javascript 的 replace() 函数,以便在低级浏览器优化空间中进行繁重的处理,而不是在昂贵的 javascript 逐字符比较中进行.
It's not scientific at all, but my old Huawei IDEOS android phone is sluggish when I plug the other functions in this thread in to my autocomplete, while this function zips along:
这根本不科学,但是当我将此线程中的其他功能插入到我的自动完成功能时,我的旧华为 IDEOS android 手机反应迟钝,而此功能则继续前进:
function accentFold(inStr) {
return inStr.replace(
/([àáa???])|([???])|([èéê?])|([ìí??])|([?])|([òó????])|([?])|([ùú?ü])|([?])|([?])/g,
function (str, a, c, e, i, n, o, s, u, y, ae) {
if (a) return 'a';
if (c) return 'c';
if (e) return 'e';
if (i) return 'i';
if (n) return 'n';
if (o) return 'o';
if (s) return 's';
if (u) return 'u';
if (y) return 'y';
if (ae) return 'ae';
}
);
}
If you're a jQuery dev, here's a handy example of using this function; you could use :icontains the same way you'd use :contains in a selector:
如果您是 jQuery 开发人员,这里有一个使用此函数的方便示例;您可以像在选择器中使用 :contains 一样使用 :icontains :
jQuery.expr[':'].icontains = function (obj, index, meta, stack) {
return accentFold(
(obj.textContent || obj.innerText || jQuery(obj).text() || '').toLowerCase()
)
.indexOf(accentFold(meta[3].toLowerCase())
) >= 0;
};
回答by Salathiel Genèse
I searched and upvoted herostwistanswer but kept searching and truly, here is a modern solution, core to JavaScript (string.localeComparefunction)
我搜索并投票支持herostwist答案,但一直在搜索,确实,这是一个现代解决方案,JavaScript 的核心(string.localeCompare函数)
var a = 'réservé'; // with accents, lowercase
var b = 'RESERVE'; // no accents, uppercase
console.log(a.localeCompare(b));
// expected output: 1
console.log(a.localeCompare(b, 'en', {sensitivity: 'base'}));
// expected output: 0
NOTE, however, that full support is still missing for some mobile browser !!!
但是请注意,某些移动浏览器仍然缺少完全支持!!!
Until then, keep watching out for full support across ALL platforms and env.
在此之前,请继续关注所有平台和环境的全面支持。
Is that all ?
这就是全部 ?
No, we can go further right now and use string.toLocaleLowerCasefunction.
不,我们现在可以更进一步并使用string.toLocaleLowerCase函数。
var dotted = '?stanbul';
console.log('EN-US: ' + dotted.toLocaleLowerCase('en-US'));
// expected output: "istanbul"
console.log('TR: ' + dotted.toLocaleLowerCase('tr'));
// expected output: "istanbul"
Thank You !
谢谢你 !
回答by James
There is no easier way to "deaccent" that I can think of, but your substitution could be streamlined a little more:
我能想到的“deaccent”没有更简单的方法,但是您的替换可以简化一点:
var makeComp = (function(){
var accents = {
a: 'àáa????',
c: '?',
e: 'èéê??',
i: 'ìí??',
n: '?',
o: 'òó????',
s: '?',
u: 'ùú?ü',
y: '?'
},
chars = /[aceinosuy]/g;
return function makeComp(input) {
return input.replace(chars, function(c){
return '[' + c + accents[c] + ']';
});
};
}());
回答by oliversisson
I think this is the neatest solution
我认为这是最简洁的解决方案
var nIC = new Intl.Collator(undefined , {sensitivity: 'base'})
var cmp = nIC.compare.bind(nIC)
It will return 0 if the two strings are the same, ignoring accents.
如果两个字符串相同,它将返回 0,忽略重音。
Alternatively you try localecompare
或者你试试 localecompare
'être'.localeCompare('etre',undefined,{sensitivity: 'base'})
回答by Jan Hagge
I made a Prototype Version of this:
我做了一个原型版本:
String.prototype.strip = function() {
var translate_re = /[??ü??ü? ]/g;
var translate = {
"?":"a", "?":"o", "ü":"u",
"?":"A", "?":"O", "ü":"U",
" ":"_", "?":"ss" // probably more to come
};
return (this.replace(translate_re, function(match){
return translate[match];})
);
};
Use like:
像这样使用:
var teststring = '? ? ü ? ? ü ?';
teststring.strip();
This will will change the String to a_o_u_A_O_U_ss
这会将字符串更改为 a_o_u_A_O_U_ss
回答by Stephen Chung
First, I'd recommend a switch statement instead of a long string of if-else if ...
首先,我建议使用 switch 语句而不是一长串 if-else if ...
Then, I am not sure why you don't like your current solution. It certainly is the cleanest one. What do you mean by not taking into account "all characters"?
然后,我不确定您为什么不喜欢当前的解决方案。它当然是最干净的。不考虑“所有字符”是什么意思?
There is no standard method in JavaScript to map accented letters to ASCII letters outside of using a third-party library, so the one you wrote is as good as any.
除了使用第三方库之外,JavaScript 中没有将重音字母映射到 ASCII 字母的标准方法,因此您编写的方法与任何方法一样好。
Also, "?" I believe maps to "ss", not a single "s". And beware of "i" with and without dot in Turkish -- I believe they refer to different letters.
还, ”?” 我相信映射到“ss”,而不是单个“s”。并注意土耳其语中带和不带点的“i”——我相信它们指的是不同的字母。
回答by yglodt
You can also use http://fusejs.io, which describes itself as "Lightweight fuzzy-search library. Zero dependencies", for fuzzy searching.
您还可以使用http://fusejs.io,它将自己描述为“轻量级模糊搜索库。零依赖”,用于模糊搜索。