使用 Javascript 查找字符串中最常见的单词?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6565333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 21:07:22  来源:igfitidea点击:

Using Javascript to find most common words in string?

javascriptstringsearch

提问by j.s

I have a large block of text, and I would like to find out the most common words being used (except for a few, like "the", "a", "and", etc).

我有一大段文字,我想找出最常用的词(除了少数几个,如“the”、“a”、“and”等)。

How would I go about searching this block of text for its most commonly used words?

我将如何搜索这段文本中最常用的词?

回答by SLaks

You should split the string into words, then loop through the words and increment a counter for each one:

您应该将字符串拆分为单词,然后遍历单词并为每个单词增加一个计数器:

var wordCounts = { };
var words = str.split(/\b/);

for(var i = 0; i < words.length; i++)
    wordCounts["_" + words[i]] = (wordCounts["_" + words[i]] || 0) + 1;

The "_" +allows it to process words like constructorthat are already properties of the object.

"_" +允许它处理constructor已经是对象属性的单词。

You may want to write words[i].toLowerCase()to count case-insensitively.

您可能希望写入words[i].toLowerCase()不区分大小写的计数。

回答by Gustavo Maloste

Coming from the future, where this question was asked again, but I started too early with the solution and it was marked as answered. Anyway, it's a complement of the answer of SLaks.

来自未来,这个问题再次被问到,但我开始太早了解决方案,它被标记为已回答。无论如何,这是对SLaks答案的补充。

function nthMostCommon(string, ammount) {
    var wordsArray = string.split(/\s/);
    var wordOccurrences = {}
    for (var i = 0; i < wordsArray.length; i++) {
        wordOccurrences['_'+wordsArray[i]] = ( wordOccurrences['_'+wordsArray[i]] || 0 ) + 1;
    }
    var result = Object.keys(wordOccurrences).reduce(function(acc, currentKey) {
        /* you may want to include a binary search here */
        for (var i = 0; i < ammount; i++) {
            if (!acc[i]) {
                acc[i] = { word: currentKey.slice(1, currentKey.length), occurences: wordOccurrences[currentKey] };
                break;
            } else if (acc[i].occurences < wordOccurrences[currentKey]) {
                acc.splice(i, 0, { word: currentKey.slice(1, currentKey.length), occurences: wordOccurrences[currentKey] });
                if (acc.length > ammount)
                    acc.pop();
                break;
            }
        }
        return acc;
    }, []);
    return result;
}

回答by ricka

Lodash 1-liner:

Lodash 1-liner:

const mostFrequentWord = _.maxBy(Object.values(_.groupBy(str.match(/\b(\w+)\b/g))), w => w.length)[0]