使用 JavaScript 进行准确字数统计的正则表达式

Question

提问by u365975

I'm trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.

我正在尝试为 JavaScript 命令组合一个正则表达式，以准确计算 textarea 中的单词数。

One solution I had found is as follows:

我找到的一种解决方案如下：

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\w+\b/).length -1;

But this doesn't count any non-Latin characters (eg: Cyrillic, Hangul, etc); it skips over them completely.

但这不包括任何非拉丁字符（例如：西里尔文、韩文等）；它完全跳过了它们。

Another one I put together:

另一个我放在一起：

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\s+/g).length -1;

But this doesn't count accurately unless the document ends in a space character. If a space character is appended to the value being counted it counts 1 word even with an empty document. Furthermore, if the document begins with a space character an extraneous word is counted.

但这并不能准确计算，除非文档以空格字符结尾。如果将空格字符附加到被计数的值，即使是空文档，它也会计数 1 个单词。此外，如果文档以空格字符开头，则会计算一个无关的单词。

Is there a regular expression I can put into this command that counts the words accurately, regardless of input method?

无论输入法如何，我是否可以将正则表达式放入此命令中以准确计算单词数？

Answer 1

回答by David Tang

This should do what you're after:

这应该做你所追求的：

value.match(/\S+/g).length;

Rather than splitting the string, you're matching on any sequence of non-whitespace characters.

您不是拆分字符串，而是匹配任何非空白字符序列。

There's the added bonus of being easily able to extract each word if needed ;)

如果需要，可以轻松提取每个单词，这是一个额外的好处；)

Answer 2

回答by morja

Try to count anything that is not whitespace and with a word boundary:

尝试计算任何不是空格且带有单词边界的内容：

value.split(/\b\S+\b/g).length

You could also try to use unicode ranges, but I am not sure if the following one is complete:

您也可以尝试使用 unicode 范围，但我不确定以下是否完整：

value.split(/[\u0080-\uFFFF\w]+/g).length

Answer 3

回答by geekdenz

For me this gave the best results:

对我来说，这给出了最好的结果：

value.split(/\b\W+\b/).length

with

和

var words = value.split(/\b\W+\b/)

you get all words.

你得到所有的话。

Explanation:

解释：

\b is a word boundary
\W is a NON-word character, capital usually means the negation
'+' means 1 or more characters or the prefixed character class

\b 是单词边界
\W 是一个非单词字符，大写通常表示否定
'+' 表示 1 个或多个字符或前缀字符类

I recommend learning regular expressions. It's a great skill to have because they are so powerful. ;-)

我建议学习正则表达式。这是一项很棒的技能，因为它们非常强大。;-)

Answer 4

回答by albertov

The correct regexp would be /s+/in order to discard non-words:

正确的正则表达式是/s+/为了丢弃非单词：

'Lorem ipsum dolor , sit amet'.split(/\S+/g).length
7
'Lorem ipsum dolor , sit amet'.split(/\s+/g).length
6

Answer 5

回答by mpjan

Try

尝试

    value.match(/\w+/g).length;

This will match a string of characters that can be in a word. Whereas something like:

这将匹配可以在单词中的字符串。而像：

    value.match(/\S+/g).length;

will result in an incorrect count if the user adds commas or other punctuation that is not followed by a space - or adds a comma with a space either side of it.

如果用户添加逗号或其他后面没有空格的标点符号 - 或者添加一个逗号两侧有空格，则会导致计数不正确。

Answer 6

回答by Valerij

you could extend/change you methods like this

你可以像这样扩展/改变你的方法

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\(.*?)\b/).length -1;if you want to match things like email-addresses as well

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\(.*?)\b/).length -1;如果您还想匹配电子邮件地址之类的内容

and

和

document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.trim().split(/\s+/g).length -1;

also try using \sas its the \wfor unicode

也尝试使用\s作为它的\wfor unicode

source:http://www.regular-expressions.info/charclass.html

来源：http://www.regular-expressions.info/charclass.html

Answer 7

回答by Sharikul Islam

my simpleJavaScript library, called FuncJS has a function called "count()" which does exactly what it's called — count words.

我的名为 FuncJS 的简单JavaScript 库有一个名为“count()”的函数，它执行它所谓的功能——计算单词。

For example, say that you have a string full of words, you can simply place it in between the function brackets, like this:

例如，假设您有一个充满单词的字符串，您可以简单地将其放在函数括号之间，如下所示：

count("How many words are in this string?");

and then call the function, which will then return the number of words. Also, this function is designed to ignore any amount of whitespace, thus giving an accurate result.

然后调用该函数，该函数将返回单词数。此外，此函数旨在忽略任何数量的空格，从而提供准确的结果。

To learn more about this function, please read the documentation at http://docs.funcjs.webege.com/count().htmland the download link for FuncJS is also on the page.

要了解有关此功能的更多信息，请阅读http://docs.funcjs.webege.com/count().html 上的文档，FuncJS 的下载链接也在页面上。

Hope this helps anyone wanting to do this! :)

希望这可以帮助任何想要这样做的人！:)

使用 JavaScript 进行准确字数统计的正则表达式

提问by u365975

回答by David Tang

回答by morja

回答by geekdenz

回答by albertov

回答by mpjan

回答by Valerij

回答by Sharikul Islam

相关推荐

最近更新

标签

使用 JavaScript 进行准确字数统计的正则表达式

提问by u365975

回答by David Tang

回答by morja

回答by geekdenz

回答by albertov

回答by mpjan

回答by Valerij

回答by Sharikul Islam

相关推荐

使用 javascript 为 chrome 扩展截取屏幕截图

Javascript 显示重定向在..倒数计时器 PHP

Javascript 按两个值排序，优先考虑其中之一

Javascript <canvas> 中的虚线笔划

相关推荐

最近更新

标签