使用 Javascript 正则表达式匹配重音字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5436824/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Matching accented characters with Javascript regexes
提问by nickf
Here's a fun snippet I ran into today:
这是我今天遇到的一个有趣的片段:
/\ba/.test("a") --> true
/\bà/.test("à") --> false
However,
然而,
/à/.test("à") --> true
Firstly, wtf?
首先,wtf?
Secondly, if I want to match an accented character at the start of a word, how can I do that? (I'd really like to avoid using over-the-top selectors like /(?:^|\s|'|\(\) ....
)
其次,如果我想在单词的开头匹配一个带重音的字符,我该怎么做?(我真的很想避免使用像 那样的顶级选择器/(?:^|\s|'|\(\) ....
)
回答by Riimu
The reason why /\bà/.test("à")
doesn't match is because "à" is not a word character. The escape sequence \b
matches only between a boundary of word character and a non word character. /\ba/.test("a")
matches because "a" is a word character. Because of that, there is a boundary between the beginning of the string (which is not a word character) and the letter "a" which is a word character.
/\bà/.test("à")
不匹配的原因是因为“à”不是单词字符。转义序列\b
仅在单词字符和非单词字符的边界之间匹配。/\ba/.test("a")
匹配,因为“a”是一个单词字符。因此,字符串的开头(不是单词字符)和作为单词字符的字母“a”之间存在边界。
Word characters in JavaScript's regex is defined as [a-zA-Z0-9_]
.
JavaScript 正则表达式中的单词字符定义为[a-zA-Z0-9_]
.
To match an accented character at the start of a string, just use the ^
character at the beginning of the regex (e.g. /^à/
). That character means the beginning of the string (unlike \b
which matches at any word boundary within the string). It's most basic and standard regular expression, so it's definitely not over the top.
要匹配字符串开头的重音字符,只需使用^
正则表达式开头的字符(例如/^à/
)。该字符表示字符串的开头(与\b
在字符串内的任何单词边界处匹配的字符不同)。它是最基本和标准的正则表达式,所以它绝对不是最重要的。
回答by stema
Stack Overflow had also an issue with non ASCII characters in regex, you can find it here. They are not coping with word boundaries, but maybe gives you anyway useful hints.
Stack Overflow 也存在正则表达式中非 ASCII 字符的问题,您可以在此处找到它。它们不处理单词边界,但可能会给您提供有用的提示。
There is another page, but he wants to match strings and not words.
还有另一个page,但他想匹配字符串而不是单词。
I don't know, and did not find now, an anchor for your problem, but when I see what monster regexes in my first link are used, your group, that you want to avoid, is not over the top and to my opinion your solution.
我不知道,现在也没有找到解决您问题的锚点,但是当我看到在我的第一个链接中使用了哪些怪物正则表达式时,您想要避免的组并没有超出我的意见你的解决方案。
回答by Craig1123
const regex = /^[\-/A-Za-z\u00C0-\u017F ]+$/;
const test1 = regex.test("à");
const test2 = regex.test("Martinez-Cortez");
const test3 = regex.test("Leonardo da vinci");
const test4 = regex.test("?");
console.log('test1', test1);
console.log('test2', test2);
console.log('test3', test3);
console.log('test4', test4);
Building off of Wak's and C?ur's answer:
基于 Wak 和 C?ur 的回答:
/^[\-/A-Za-z\u00C0-\u017F ]+$/
/^[\-/A-Za-z\u00C0-\u017F ]+$/
Works for spaces and dashes too.
也适用于空格和破折号。
Example: Leonardo da vinci, Martinez-Cortez
示例:列奥纳多·达·芬奇、马丁内斯-科尔特斯