使用 Javascript 查找字符串中最常见的单词?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6565333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using Javascript to find most common words in string?
提问by j.s
I have a large block of text, and I would like to find out the most common words being used (except for a few, like "the", "a", "and", etc).
我有一大段文字,我想找出最常用的词(除了少数几个,如“the”、“a”、“and”等)。
How would I go about searching this block of text for its most commonly used words?
我将如何搜索这段文本中最常用的词?
回答by SLaks
You should split the string into words, then loop through the words and increment a counter for each one:
您应该将字符串拆分为单词,然后遍历单词并为每个单词增加一个计数器:
var wordCounts = { };
var words = str.split(/\b/);
for(var i = 0; i < words.length; i++)
wordCounts["_" + words[i]] = (wordCounts["_" + words[i]] || 0) + 1;
The "_" +
allows it to process words like constructor
that are already properties of the object.
这"_" +
允许它处理constructor
已经是对象属性的单词。
You may want to write words[i].toLowerCase()
to count case-insensitively.
您可能希望写入words[i].toLowerCase()
不区分大小写的计数。
回答by Gustavo Maloste
Coming from the future, where this question was asked again, but I started too early with the solution and it was marked as answered. Anyway, it's a complement of the answer of SLaks.
来自未来,这个问题再次被问到,但我开始太早了解决方案,它被标记为已回答。无论如何,这是对SLaks答案的补充。
function nthMostCommon(string, ammount) {
var wordsArray = string.split(/\s/);
var wordOccurrences = {}
for (var i = 0; i < wordsArray.length; i++) {
wordOccurrences['_'+wordsArray[i]] = ( wordOccurrences['_'+wordsArray[i]] || 0 ) + 1;
}
var result = Object.keys(wordOccurrences).reduce(function(acc, currentKey) {
/* you may want to include a binary search here */
for (var i = 0; i < ammount; i++) {
if (!acc[i]) {
acc[i] = { word: currentKey.slice(1, currentKey.length), occurences: wordOccurrences[currentKey] };
break;
} else if (acc[i].occurences < wordOccurrences[currentKey]) {
acc.splice(i, 0, { word: currentKey.slice(1, currentKey.length), occurences: wordOccurrences[currentKey] });
if (acc.length > ammount)
acc.pop();
break;
}
}
return acc;
}, []);
return result;
}
回答by ricka
Lodash 1-liner:
Lodash 1-liner:
const mostFrequentWord = _.maxBy(Object.values(_.groupBy(str.match(/\b(\w+)\b/g))), w => w.length)[0]