javascript 计算字符串中的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18679576/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Counting words in string
提问by internals-in
function WordCount(str) {
var totalSoFar = 0;
for (var i = 0; i < WordCount.length; i++)
if (str(i) === " ") { // if a space is found in str
totalSoFar = +1; // add 1 to total so far
}
totalsoFar += 1; // add 1 to totalsoFar to account for extra space since 1 space = 2 words
}
console.log(WordCount("Random String"));
I think I have got this down pretty well, except I think that the if
statement is wrong. How do I say if(str(i)
contains a space, add 1.
我想我已经很好地理解了这一点,除了我认为这个if
陈述是错误的。我怎么说if(str(i)
包含一个空格,加1。
Edit:
编辑:
I found out (thanks to Blender) that I can do this with a lot less code:
我发现(感谢 Blender)我可以用更少的代码来做到这一点:
function WordCount(str) {
return str.split(" ").length;
}
console.log(WordCount("hello world"));
回答by Blender
Use square brackets, not parentheses:
使用方括号,而不是圆括号:
str[i] === " "
Or charAt
:
或charAt
:
str.charAt(i) === " "
You could also do it with .split()
:
你也可以这样做.split()
:
return str.split(' ').length;
回答by internals-in
Try these before reinventing the wheels
在重新发明轮子之前尝试这些
from Count number of words in string using JavaScript
function countWords(str) {
return str.trim().split(/\s+/).length;
}
from http://www.mediacollege.com/internet/javascript/text/count-words.html
来自http://www.mediacollege.com/internet/javascript/text/count-words.html
function countWords(s){
s = s.replace(/(^\s*)|(\s*$)/gi,"");//exclude start and end white-space
s = s.replace(/[ ]{2,}/gi," ");//2 or more space to 1
s = s.replace(/\n /,"\n"); // exclude newline with a start spacing
return s.split(' ').filter(function(str){return str!="";}).length;
//return s.split(' ').filter(String).length; - this can also be used
}
from Use JavaScript to count words in a string, WITHOUT using a regex - this will be the best approach
来自Use JavaScript to count words in a string, without using a regex - 这将是最好的方法
function WordCount(str) {
return str.split(' ')
.filter(function(n) { return n != '' })
.length;
}
Notes From Author:
You can adapt this script to count words in whichever way you like. The important part is
s.split(' ').length
— this counts the spaces. The script attempts to remove all extra spaces (double spaces etc) before counting. If the text contains two words without a space between them, it will count them as one word, e.g. "First sentence .Start of next sentence".
作者注释:
您可以修改此脚本以按您喜欢的任何方式计算单词。重要的部分是
s.split(' ').length
——这计算空间。该脚本尝试在计数之前删除所有额外的空格(双空格等)。如果文本包含两个单词之间没有空格,则将它们视为一个单词,例如“第一句。下一句开始”。
回答by Alex
One more way to count words in a string. This code counts words that contain only alphanumeric characters and "_", "'", "-", "'" chars.
计算字符串中单词的另一种方法。此代码计算仅包含字母数字字符和“_”、“'”、“-”、“'”字符的单词。
function countWords(str) {
var matches = str.match(/[\w\d\'\'-]+/gi);
return matches ? matches.length : 0;
}
回答by Mr. Polywhirl
After cleaning the string, you can match non-whitespace characters or word-boundaries.
清理字符串后,您可以匹配非空白字符或单词边界。
Here are two simple regular expressions to capture words in a string:
这里有两个简单的正则表达式来捕获字符串中的单词:
- Sequence of non-white-space characters:
/\S+/g
- Valid characters between word boundaries:
/\b[a-z\d]+\b/g
- 非空白字符序列:
/\S+/g
- 单词边界之间的有效字符:
/\b[a-z\d]+\b/g
The example below shows how to retrieve the word count from a string, by using these capturing patterns.
下面的示例显示了如何使用这些捕获模式从字符串中检索字数。
/*Redirect console output to HTML.*/document.body.innerHTML='';console.log=function(s){document.body.innerHTML+=s+'\n';};
/*String format.*/String.format||(String.format=function(f){return function(a){return f.replace(/{(\d+)}/g,function(m,n){return"undefined"!=typeof a[n]?a[n]:m})}([].slice.call(arguments,1))});
// ^ IGNORE CODE ABOVE ^
// =================
// Clean and match sub-strings in a string.
function extractSubstr(str, regexp) {
return str.replace(/[^\w\s]|_/g, '')
.replace(/\s+/g, ' ')
.toLowerCase().match(regexp) || [];
}
// Find words by searching for sequences of non-whitespace characters.
function getWordsByNonWhiteSpace(str) {
return extractSubstr(str, /\S+/g);
}
// Find words by searching for valid characters between word-boundaries.
function getWordsByWordBoundaries(str) {
return extractSubstr(str, /\b[a-z\d]+\b/g);
}
// Example of usage.
var edisonQuote = "I have not failed. I've just found 10,000 ways that won't work.";
var words1 = getWordsByNonWhiteSpace(edisonQuote);
var words2 = getWordsByWordBoundaries(edisonQuote);
console.log(String.format('"{0}" - Thomas Edison\n\nWord count via:\n', edisonQuote));
console.log(String.format(' - non-white-space: ({0}) [{1}]', words1.length, words1.join(', ')));
console.log(String.format(' - word-boundaries: ({0}) [{1}]', words2.length, words2.join(', ')));
body { font-family: monospace; white-space: pre; font-size: 11px; }
Finding Unique Words
寻找独特的词
You could also create a mapping of words to get unique counts.
您还可以创建单词映射以获得唯一计数。
function cleanString(str) {
return str.replace(/[^\w\s]|_/g, '')
.replace(/\s+/g, ' ')
.toLowerCase();
}
function extractSubstr(str, regexp) {
return cleanString(str).match(regexp) || [];
}
function getWordsByNonWhiteSpace(str) {
return extractSubstr(str, /\S+/g);
}
function getWordsByWordBoundaries(str) {
return extractSubstr(str, /\b[a-z\d]+\b/g);
}
function wordMap(str) {
return getWordsByWordBoundaries(str).reduce(function(map, word) {
map[word] = (map[word] || 0) + 1;
return map;
}, {});
}
function mapToTuples(map) {
return Object.keys(map).map(function(key) {
return [ key, map[key] ];
});
}
function mapToSortedTuples(map, sortFn, sortOrder) {
return mapToTuples(map).sort(function(a, b) {
return sortFn.call(undefined, a, b, sortOrder);
});
}
function countWords(str) {
return getWordsByWordBoundaries(str).length;
}
function wordFrequency(str) {
return mapToSortedTuples(wordMap(str), function(a, b, order) {
if (b[1] > a[1]) {
return order[1] * -1;
} else if (a[1] > b[1]) {
return order[1] * 1;
} else {
return order[0] * (a[0] < b[0] ? -1 : (a[0] > b[0] ? 1 : 0));
}
}, [1, -1]);
}
function printTuples(tuples) {
return tuples.map(function(tuple) {
return padStr(tuple[0], ' ', 12, 1) + ' -> ' + tuple[1];
}).join('\n');
}
function padStr(str, ch, width, dir) {
return (width <= str.length ? str : padStr(dir < 0 ? ch + str : str + ch, ch, width, dir)).substr(0, width);
}
function toTable(data, headers) {
return $('<table>').append($('<thead>').append($('<tr>').append(headers.map(function(header) {
return $('<th>').html(header);
})))).append($('<tbody>').append(data.map(function(row) {
return $('<tr>').append(row.map(function(cell) {
return $('<td>').html(cell);
}));
})));
}
function addRowsBefore(table, data) {
table.find('tbody').prepend(data.map(function(row) {
return $('<tr>').append(row.map(function(cell) {
return $('<td>').html(cell);
}));
}));
return table;
}
$(function() {
$('#countWordsBtn').on('click', function(e) {
var str = $('#wordsTxtAra').val();
var wordFreq = wordFrequency(str);
var wordCount = countWords(str);
var uniqueWords = wordFreq.length;
var summaryData = [
[ 'TOTAL', wordCount ],
[ 'UNIQUE', uniqueWords ]
];
var table = toTable(wordFreq, ['Word', 'Frequency']);
addRowsBefore(table, summaryData);
$('#wordFreq').html(table);
});
});
table {
border-collapse: collapse;
table-layout: fixed;
width: 200px;
font-family: monospace;
}
thead {
border-bottom: #000 3px double;;
}
table, td, th {
border: #000 1px solid;
}
td, th {
padding: 2px;
width: 100px;
overflow: hidden;
}
textarea, input[type="button"], table {
margin: 4px;
padding: 2px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<h1>Word Frequency</h1>
<textarea id="wordsTxtAra" cols="60" rows="8">Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.
But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth.</textarea><br />
<input type="button" id="countWordsBtn" value="Count Words" />
<div id="wordFreq"></div>
回答by Sean
I think this method is more than you want
我认为这种方法比你想要的多
var getWordCount = function(v){
var matches = v.match(/\S+/g) ;
return matches?matches.length:0;
}
回答by iamwhitebox
String.prototype.match
returns an array, we can then check the length,
String.prototype.match
返回一个数组,然后我们可以检查长度,
I find this method to be most descriptive
我发现这种方法最具描述性
var str = 'one two three four five';
str.match(/\w+/g).length;
回答by Tim
The easiest way I've find so far is to use a regex with split.
到目前为止,我找到的最简单的方法是使用带有 split 的正则表达式。
var calculate = function() {
var string = document.getElementById('input').value;
var length = string.split(/[^\s]+/).length - 1;
document.getElementById('count').innerHTML = length;
};
<textarea id="input">My super text that does 7 words.</textarea>
<button onclick="calculate()">Calculate</button>
<span id="count">7</span> words
回答by neokio
The answer given by @7-isnotbad is extremely close, but doesn't count single-word lines. Here's the fix, which seems to account for every possible combination of words, spaces and newlines.
@7-isnotbad 给出的答案非常接近,但不包括单字行。这是修复程序,它似乎解释了单词、空格和换行符的每种可能组合。
function countWords(s){
s = s.replace(/\n/g,' '); // newlines to space
s = s.replace(/(^\s*)|(\s*$)/gi,''); // remove spaces from start + end
s = s.replace(/[ ]{2,}/gi,' '); // 2 or more spaces to 1
return s.split(' ').length;
}
回答by user3743140
Here's my approach, which simply splits a string by spaces, then for loops the array and increases the count if the array[i] matches a given regex pattern.
这是我的方法,它简单地按空格拆分字符串,然后 for 循环数组并增加计数,如果数组 [i] 匹配给定的正则表达式模式。
function wordCount(str) {
var stringArray = str.split(' ');
var count = 0;
for (var i = 0; i < stringArray.length; i++) {
var word = stringArray[i];
if (/[A-Za-z]/.test(word)) {
count++
}
}
return count
}
Invoked like so:
像这样调用:
var str = "testing strings here's a string --.. ? // ... random characters ,,, end of string";
wordCount(str)
(added extra characters & spaces to show accuracy of function)
(添加额外的字符和空格以显示功能的准确性)
The str above returns 10, which is correct!
上面的 str 返回 10,这是正确的!