javascript javascript中的词频

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30906807/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-28 12:54:18  来源:igfitidea点击:

word frequency in javascript

javascript

提问by Anil

enter image description here

在此处输入图片说明

How can I implement javascript function to calculate frequency of each word in a given sentence.

如何实现 javascript 函数来计算给定句子中每个单词的频率。

this is my code:

这是我的代码:

function search () {
  var data = document.getElementById('txt').value;
  var temp = data;
  var words = new Array();
  words = temp.split(" ");
  var uniqueWords = new Array();
  var count = new Array();


  for (var i = 0; i < words.length; i++) {
    //var count=0;
    var f = 0;
    for (j = 0; j < uniqueWords.length; j++) {
      if (words[i] == uniqueWords[j]) {
        count[j] = count[j] + 1;
        //uniqueWords[j]=words[i];
        f = 1;
      }
    }
    if (f == 0) {
      count[i] = 1;
      uniqueWords[i] = words[i];
    }
    console.log("count of " + uniqueWords[i] + " - " + count[i]);
  }
}

am unable to trace out the problem ..any help is greatly appriciated. output in this format: count of is - 1 count of the - 2..

我无法找出问题所在......任何帮助都非常有用。以这种格式输出:count of is - 1 count of the - 2..

input: this is anil is kum the anil

输入:这是 anil is kum the anil

回答by Cymen

Here is a JavaScript function to get the frequency of each word in a sentence:

这是一个用于获取句子中每个单词出现频率的 JavaScript 函数:

function wordFreq(string) {
    var words = string.replace(/[.]/g, '').split(/\s/);
    var freqMap = {};
    words.forEach(function(w) {
        if (!freqMap[w]) {
            freqMap[w] = 0;
        }
        freqMap[w] += 1;
    });

    return freqMap;
}

It will return a hash of word to word count. So for example, if we run it like so:

它将返回单词的哈希值到单词计数。例如,如果我们像这样运行它:

console.log(wordFreq("I am the big the big bull."));
> Object {I: 1, am: 1, the: 2, big: 2, bull: 1}

You can iterate over the words with Object.keys(result).sort().forEach(result) {...}. So we could hook that up like so:

您可以使用 迭代单词Object.keys(result).sort().forEach(result) {...}。所以我们可以像这样连接起来:

var freq = wordFreq("I am the big the big bull.");
Object.keys(freq).sort().forEach(function(word) {
    console.log("count of " + word + " is " + freq[word]);
});

Which would output:

这将输出:

count of I is 1
count of am is 1
count of big is 2
count of bull is 1
count of the is 2

JSFiddle: http://jsfiddle.net/ah6wsbs6/

JSFiddle:http: //jsfiddle.net/ah6wsbs6/

And here is wordFreqfunction in ES6:

这是wordFreqES6 中的函数:

function wordFreq(string) {
  return string.replace(/[.]/g, '')
    .split(/\s/)
    .reduce((map, word) =>
      Object.assign(map, {
        [word]: (map[word])
          ? map[word] + 1
          : 1,
      }),
      {}
    );
}

JSFiddle: http://jsfiddle.net/r1Lo79us/

JSFiddle:http: //jsfiddle.net/r1Lo79us/

回答by Sampson

I feel you have over-complicated things by having multiple arrays, strings, and engaging in frequent (and hard to follow) context-switching between loops, and nested loops.

我觉得您通过拥有多个数组、字符串以及在循环和嵌套循环之间频繁(且难以遵循)上下文切换使事情变得过于复杂。

Below is the approach I would encourage you to consider taking. I've inlined comments to explain each step along the way. If any of this is unclear, please let me know in the comments and I'll revisit to improve clarity.

以下是我鼓励您考虑采用的方法。我已经内联注释来解释沿途的每一步。如果有任何不清楚的地方,请在评论中告诉我,我会重新访问以提高清晰度。

(function () {

    /* Below is a regular expression that finds alphanumeric characters
       Next is a string that could easily be replaced with a reference to a form control
       Lastly, we have an array that will hold any words matching our pattern */
    var pattern = /\w+/g,
        string = "I I am am am yes yes.",
        matchedWords = string.match( pattern );

    /* The Array.prototype.reduce method assists us in producing a single value from an
       array. In this case, we're going to use it to output an object with results. */
    var counts = matchedWords.reduce(function ( stats, word ) {

        /* `stats` is the object that we'll be building up over time.
           `word` is each individual entry in the `matchedWords` array */
        if ( stats.hasOwnProperty( word ) ) {
            /* `stats` already has an entry for the current `word`.
               As a result, let's increment the count for that `word`. */
            stats[ word ] = stats[ word ] + 1;
        } else {
            /* `stats` does not yet have an entry for the current `word`.
               As a result, let's add a new entry, and set count to 1. */
            stats[ word ] = 1;
        }

        /* Because we are building up `stats` over numerous iterations,
           we need to return it for the next pass to modify it. */
        return stats;

    }, {} );

    /* Now that `counts` has our object, we can log it. */
    console.log( counts );

}());

回答by AskMen

const sentence = 'Hi my friend how are you my friend';

const countWords = (sentence) => {
    const convertToObject = sentence.split(" ").map( (i, k) => {
        return {
          element: {
              word: i,
              nr: sentence.split(" ").filter(j => j === i).length + ' occurrence',
          }

      }
  });
    return Array.from(new Set(convertToObject.map(JSON.stringify))).map(JSON.parse)
};

console.log(countWords(sentence));

回答by Lucien Stals

Here is an updated version of your own code...

这是您自己的代码的更新版本...

<!DOCTYPE html>
<html>
<head>
<title>string frequency</title>
<style type="text/css">
#text{
    width:250px;
}
</style>
</head>

<body >

<textarea id="txt" cols="25" rows="3" placeholder="add your text here">   </textarea></br>
<button type="button" onclick="search()">search</button>

    <script >

        function search()
        {
            var data=document.getElementById('txt').value;
            var temp=data;
            var words=new Array();
            words=temp.split(" ");

            var unique = {};


            for (var i = 0; i < words.length; i++) {
                var word = words[i];
                console.log(word);

                if (word in unique)
                {
                    console.log("word found");
                    var count  = unique[word];
                    count ++;
                    unique[word]=count;
                }
                else
                {
                    console.log("word NOT found");
                    unique[word]=1;
                }
            }
            console.log(unique);
        }

    </script>

</body>

I think your loop was overly complicated. Also, trying to produce the final count while still doing your first pass over the array of words is bound to fail because you can't test for uniqueness until you have checked each word in the array.

我认为你的循环过于复杂。此外,在第一次遍历单词数组的同时尝试生成最终计数肯定会失败,因为在检查数组中的每个单词之前,您无法测试唯一性。

Instead of all your counters, I've used a Javascript object to work as an associative array, so we can store each unique word, and the count of how many times it occurs.

我没有使用所有的计数器,而是使用了一个 Javascript 对象作为关联数组,这样我们就可以存储每个唯一的单词,以及它出现的次数。

Then, once we exit the loop, we can see the final result.

然后,一旦我们退出循环,我们就可以看到最终结果。

Also, this solution uses no regex ;)

此外,此解决方案不使用正则表达式;)

I'll also add that it's very hard to count words just based on spaces. In this code, "one, two, one" will results in "one," and "one" as being different, unique words.

我还要补充一点,仅根据空格来计算单词是非常困难的。在此代码中,“一、二、一”将导致“一”和“一”成为不同的唯一词。

回答by Anurag Peshne

While both of the answers here are correct maybe are better but none of them address OP's question (what is wrong with the his code).

虽然这里的两个答案都是正确的,但也许更好,但它们都没有解决 OP 的问题(他的代码有什么问题)。

The problem with OP's code is here:

OP 代码的问题在这里:

if(f==0){
    count[i]=1;
    uniqueWords[i]=words[i];
}

On every new word (unique word) the code adds it to uniqueWordsat index at which the word was in words. Hence there are gaps in uniqueWordsarray. This is the reason for some undefinedvalues.

在每个新词(唯一词)上,代码将其添加到uniqueWords该词所在的索引处words。因此uniqueWords数组中存在间隙。这是某些undefined值的原因。

Try printing uniqueWords. It should give something like:

尝试打印uniqueWords。它应该给出类似的东西:

["this", "is", "anil", 4: "kum", 5: "the"]

["this", "is", "anil", 4: "kum", 5: "the"]

Note there no element for index 3.

请注意,索引 3 没有元素。

Also the printing of final count should be after processing all the words in the wordsarray.

此外,最终计数的打印应该在处理完words数组中的所有单词之后进行。

Here's corrected version:

这是更正的版本:

function search()
{
    var data=document.getElementById('txt').value;
    var temp=data;
    var words=new Array();
    words=temp.split(" ");
    var uniqueWords=new Array();
    var count=new Array();


    for (var i = 0; i < words.length; i++) {
        //var count=0;
        var f=0;
        for(j=0;j<uniqueWords.length;j++){
            if(words[i]==uniqueWords[j]){
                count[j]=count[j]+1;
                //uniqueWords[j]=words[i];
                f=1;
            }
        }
        if(f==0){
            count[i]=1;
            uniqueWords[i]=words[i];
        }
    }
    for ( i = 0; i < uniqueWords.length; i++) {
        if (typeof uniqueWords[i] !== 'undefined')
            console.log("count of "+uniqueWords[i]+" - "+count[i]);       
    }
}

I have just moved the printing of count out of the processing loop into a new loop and added a if not undefinedcheck.

我刚刚将 count 的打印从处理循环中移到了一个新循环中并添加了一个if not undefined检查。

Fiddle: https://jsfiddle.net/cdLgaq3a/

小提琴:https: //jsfiddle.net/cdLgaq3a/

回答by thdoan

I'd go with Sampson's match-reduce method for slightly better efficiency. Here's a modified version of it that is more production-ready. It's not perfect, but it should cover the vast majority of scenarios (i.e., "good enough").

我会使用 Sampson 的 match-reduce 方法来提高效率。这是它的一个修改版本,它更适合生产。它并不完美,但它应该涵盖绝大多数场景(即“足够好”)。

function calcWordFreq(s) {
  // Normalize
  s = s.toLowerCase();
  // Strip quotes and brackets
  s = s.replace(/["“”(\[{}\])]|\B['‘]([^'']+)['']/g, '');
  // Strip dashes and ellipses
  s = s.replace(/[?–—―…]|--|\.\.\./g, ' ');
  // Strip punctuation marks
  s = s.replace(/[!?;:.,]\B/g, '');
  return s.match(/\S+/g).reduce(function(oFreq, sWord) {
    if (oFreq.hasOwnProperty(sWord)) ++oFreq[sWord];
    else oFreq[sWord] = 1;
    return oFreq;
  }, {});
}

calcWordFreq('A ‘bad', “BAD” wolf-man...a good ol\' spook -- I\'m frightened!')returns

calcWordFreq('A ‘bad', “BAD” wolf-man...a good ol\' spook -- I\'m frightened!')回报

{
  "a": 2
  "bad": 2
  "frightened": 1
  "good": 1
  "i'm": 1
  "ol'": 1
  "spook": 1
  "wolf-man": 1
}