使用 JavaScript 进行准确字数统计的正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4593565/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regular Expression for accurate word-count using JavaScript
提问by u365975
I'm trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.
我正在尝试为 JavaScript 命令组合一个正则表达式,以准确计算 textarea 中的单词数。
One solution I had found is as follows:
我找到的一种解决方案如下:
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\w+\b/).length -1;
But this doesn't count any non-Latin characters (eg: Cyrillic, Hangul, etc); it skips over them completely.
但这不包括任何非拉丁字符(例如:西里尔文、韩文等);它完全跳过了它们。
Another one I put together:
另一个我放在一起:
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\s+/g).length -1;
But this doesn't count accurately unless the document ends in a space character. If a space character is appended to the value being counted it counts 1 word even with an empty document. Furthermore, if the document begins with a space character an extraneous word is counted.
但这并不能准确计算,除非文档以空格字符结尾。如果将空格字符附加到被计数的值,即使是空文档,它也会计数 1 个单词。此外,如果文档以空格字符开头,则会计算一个无关的单词。
Is there a regular expression I can put into this command that counts the words accurately, regardless of input method?
无论输入法如何,我是否可以将正则表达式放入此命令中以准确计算单词数?
回答by David Tang
This should do what you're after:
这应该做你所追求的:
value.match(/\S+/g).length;
Rather than splitting the string, you're matching on any sequence of non-whitespace characters.
您不是拆分字符串,而是匹配任何非空白字符序列。
There's the added bonus of being easily able to extract each word if needed ;)
如果需要,可以轻松提取每个单词,这是一个额外的好处;)
回答by morja
Try to count anything that is not whitespace and with a word boundary:
尝试计算任何不是空格且带有单词边界的内容:
value.split(/\b\S+\b/g).length
You could also try to use unicode ranges, but I am not sure if the following one is complete:
您也可以尝试使用 unicode 范围,但我不确定以下是否完整:
value.split(/[\u0080-\uFFFF\w]+/g).length
回答by geekdenz
For me this gave the best results:
对我来说,这给出了最好的结果:
value.split(/\b\W+\b/).length
with
和
var words = value.split(/\b\W+\b/)
you get all words.
你得到所有的话。
Explanation:
解释:
- \b is a word boundary
- \W is a NON-word character, capital usually means the negation
- '+' means 1 or more characters or the prefixed character class
- \b 是单词边界
- \W 是一个非单词字符,大写通常表示否定
- '+' 表示 1 个或多个字符或前缀字符类
I recommend learning regular expressions. It's a great skill to have because they are so powerful. ;-)
我建议学习正则表达式。这是一项很棒的技能,因为它们非常强大。;-)
回答by albertov
The correct regexp would be /s+/
in order to discard non-words:
正确的正则表达式是/s+/
为了丢弃非单词:
'Lorem ipsum dolor , sit amet'.split(/\S+/g).length
7
'Lorem ipsum dolor , sit amet'.split(/\s+/g).length
6
回答by mpjan
Try
尝试
value.match(/\w+/g).length;
This will match a string of characters that can be in a word. Whereas something like:
这将匹配可以在单词中的字符串。而像:
value.match(/\S+/g).length;
will result in an incorrect count if the user adds commas or other punctuation that is not followed by a space - or adds a comma with a space either side of it.
如果用户添加逗号或其他后面没有空格的标点符号 - 或者添加一个逗号两侧有空格,则会导致计数不正确。
回答by Valerij
you could extend/change you methods like this
你可以像这样扩展/改变你的方法
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\(.*?)\b/).length -1;
if you want to match things like email-addresses as well
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\(.*?)\b/).length -1;
如果您还想匹配电子邮件地址之类的内容
and
和
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.trim().split(/\s+/g).length -1;
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.trim().split(/\s+/g).length -1;
also try using \s
as its the \w
for unicode
也尝试使用\s
作为它的\w
for unicode
source:http://www.regular-expressions.info/charclass.html
来源:http://www.regular-expressions.info/charclass.html
回答by Sharikul Islam
my simpleJavaScript library, called FuncJS has a function called "count()" which does exactly what it's called — count words.
我的名为 FuncJS 的简单JavaScript 库有一个名为“count()”的函数,它执行它所谓的功能——计算单词。
For example, say that you have a string full of words, you can simply place it in between the function brackets, like this:
例如,假设您有一个充满单词的字符串,您可以简单地将其放在函数括号之间,如下所示:
count("How many words are in this string?");
and then call the function, which will then return the number of words. Also, this function is designed to ignore any amount of whitespace, thus giving an accurate result.
然后调用该函数,该函数将返回单词数。此外,此函数旨在忽略任何数量的空格,从而提供准确的结果。
To learn more about this function, please read the documentation at http://docs.funcjs.webege.com/count().htmland the download link for FuncJS is also on the page.
要了解有关此功能的更多信息,请阅读http://docs.funcjs.webege.com/count().html 上的文档,FuncJS 的下载链接也在页面上。
Hope this helps anyone wanting to do this! :)
希望这可以帮助任何想要这样做的人!:)