jQuery 带有西里尔字母的正则表达式

Question

提问by Ji?í Valou?ek

I have an jQuery function for word counting in textarea field. In addition its excludes all words, which are closed in [[[tripple bracket]]]. It works great with latin character, but it has a problem with cyrillic sentences. I suppose that the error is in part with regular expression:

我有一个 jQuery 函数，用于在 textarea 字段中进行字数统计。此外，它排除了所有在 [[[三方括号]]] 中封闭的单词。它适用于拉丁字符，但在西里尔文句子中存在问题。我想错误部分与正则表达式有关：

$(field).val().replace(/\[\[\[[^\]]*\]\]\]/g, '').match(/\b/g);

Example with both kind of phrases: http://jsfiddle.net/A3cEG/2/

两种短语的示例：http: //jsfiddle.net/A3cEG/2/

I need count all word, including cirillic expressions, not only words in latin. How to do that?

我需要计算所有单词，包括西里尔语表达，而不仅仅是拉丁语单词。怎么做？

Answer 1

回答by p.s.w.g

JavaScript (at least the versions most widely used) does not fully support Unicode. That is to say, \wmatches only Latin letters, decimal digits, and underscores ([a-zA-Z0-9_]), and \bmatches the boundary the between a word character and and a non-word character.

JavaScript（至少是使用最广泛的版本）并不完全支持 Unicode。即\w只匹配拉丁字母、十进制数字和下划线（[a-zA-Z0-9_]），\b匹配单词字符与非单词字符的边界。

To find all words in an input string using Latin or Cyrillic, you'd have to do something like this:

要使用拉丁文或西里尔文查找输入字符串中的所有单词，您必须执行以下操作：

.match(/[\wа-я]+/ig); // where а is the Cyrillic а.

Or if you prefer:

或者，如果您更喜欢：

.match(/[\w\u0430-\u044f]+/ig);

Of course this will probably mean you need to tweak your code a little bit, since here it will match all words rather than word boundaries. Note that [а-я]matches any letter in the 'basic Cyrillic alphabet' as described here. To match letters outside of this range, you can modify the character set as necessary to include those letters, e.g. to also match the Russian Ё/ё, use [а-яё].

当然，这可能意味着您需要稍微调整您的代码，因为在这里它将匹配所有单词而不是单词边界。请注意，[а-я]匹配此处所述的“基本西里尔字母”中的任何字母。要匹配此范围之外的字母，您可以根据需要修改字符集以包含这些字母，例如，还匹配俄语 Ё/ё，请使用[а-яё].

Also note that your triple-bracket pattern can be simplified to:

另请注意，您的三重括号模式可以简化为：

.replace(/\[{3}[^]]*]{3}/g, '')

Alternatively, you might want to look at the XRegExpproject—which is an open-source project to add new features to the base JavaScript regular expression engine—and its Unicodeaddon.

或者，您可能想查看XRegExp项目（这是一个开源项目，用于向基本 JavaScript 正则表达式引擎添加新功能）及其Unicode插件。

Answer 2

回答by Dubaua

Beware of using range of cyrillic letters, it may contain unnecessary characters within. There is bulletproof regexp contains only cyrillic letters:

小心使用西里尔字母范围，其中可能包含不必要的字符。有防弹正则表达式只包含西里尔字母：

/^[аАбБвВгГдДеЕёЁжЖзЗиИйЙкКлЛмМнНоОпПрРсСтТуУфФхХцЦчЧшШщЩъЪыЫьЬэЭюЮяЯ]+$/

jQuery 带有西里尔字母的正则表达式

提问by Ji?í Valou?ek

回答by p.s.w.g

回答by Dubaua

相关推荐

最近更新

标签

jQuery 带有西里尔字母的正则表达式

提问by Ji?í Valou?ek

回答by p.s.w.g

回答by Dubaua

相关推荐

当用户手动滚动时，Jquery .animate() 停止滚动？

jQuery 如何将 div 的内部文本限制为 175 个字符？

jQuery 使用事件侦听器的 AngularJS 自定义指令

jquery：单击以开头的每个 id 元素

相关推荐

最近更新

标签