在 Ruby 中计算字符串中单词的最佳方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1416059/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 21:44:52  来源:igfitidea点击:

Best way to count words in a string in Ruby?

ruby-on-railsruby

提问by Tom Lehman

Is there anything better than string.scan(/(\w|-)+/).size(the -is so, e.g., "one-way street" counts as 2 words instead of 3)?

有什么比string.scan(/(\w|-)+/).size-就是这样,例如,“单向街”算作 2 个单词而不是 3 个单词)更好的吗?

回答by KitsuneYMG

string.split.size


Edited to explain multiple spaces

编辑以解释多个空格

From the Ruby String Documentation page

Ruby 字符串文档页面

split(pattern=$;, [limit]) → anArray

Divides str into substrings based on a delimiter, returning an array of these substrings.

If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.

If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. If pattern contains groups, the respective matches will be returned in the array as well.

If pattern is omitted, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ' ' were specified.

If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.

split(pattern=$;, [limit]) → anArray

根据分隔符将 str 分成子字符串,返回这些子字符串的数组。

如果pattern 是String,那么在拆分str 时,它的内容将用作分隔符。如果 pattern 是单个空格,则 str 在空格上拆分,忽略前导空格和连续空格字符的运行。

如果模式是正则表达式,则在模式匹配的地方分割 str。每当模式匹配零长度字符串时, str 就会被拆分为单个字符。如果模式包含组,则相应的匹配项也将在数组中返回。

如果省略模式,$ 的值;用来。如果 $; 为 nil (这是默认值), str 在空格上分割,就像指定了 ' ' 一样。

如果省略 limit 参数,则会抑制尾随空字段。如果 limit 为正数,则最多返回该数量的字段(如果 limit 为 1,则整个字符串将作为数组中的唯一条目返回)。如果为负,则返回的字段数没有限制,并且不抑制尾随空字段。

" now's  the time".split        #=> ["now's", "the", "time"]

While that is the current version of ruby as of this edit, I learned on 1.7 (IIRC), where that also worked. I just tested it on 1.8.3.

虽然这是本次编辑时 ruby​​ 的当前版本,但我在 1.7 (IIRC) 上学习,这也有效。我刚刚在 1.8.3 上测试过。

回答by Mohamad

I know this is an old question, but this might be useful to someone else looking for something more sophisticated than string.split. I wrote the words_countedgem to solve this particular problem, since defining words is pretty tricky.

我知道这是一个老问题,但这对于寻找比string.split. 我编写了words_countedgem 来解决这个特定的问题,因为定义单词非常棘手。

The gem lets you define your own custom criteria, or use the out of the box regexp, which is pretty handy for most use cases. You can pre-filter words with a variety of options, including a string, lambda, array, or another regexp.

gem 允许您定义自己的自定义标准,或使用开箱即用的正则表达式,这对于大多数用例来说非常方便。您可以使用各种选项预先过滤单词,包括字符串、lambda、数组或其他正则表达式。

counter = WordsCounted::Counter.new("Hello, Renée! 123")
counter.word_count #=> 2
counter.words #=> ["Hello", "Renée"]

# filter the word "hello"
counter = WordsCounted::Counter.new("Hello, Renée!", reject: "Hello")
counter.word_count #=> 1
counter.words #=> ["Renée"]

# Count numbers only
counter = WordsCounted::Counter.new("Hello, Renée! 123", rexexp: /[0-9]/)
counter.word_count #=> 1
counter.words #=> ["123"]

The gem provides a bunch more useful methods.

gem 提供了一堆更有用的方法

回答by Mohamad

If the 'word' in this case can be described as an alphanumeric sequence which can include '-' then the following solution may be appropriate (assuming that everything that doesn't match the 'word' pattern is a separator):

如果在这种情况下“单词”可以描述为可以包含“-”的字母数字序列,那么以下解决方案可能是合适的(假设与“单词”模式不匹配的所有内容都是分隔符):


>> 'one-way street'.split(/[^-a-zA-Z]/).size
=> 2
>> 'one-way street'.split(/[^-a-zA-Z]/).each { |m| puts m }
one-way
street
=> ["one-way", "street"]

However, there are some other symbols that can be included in the regex - for example, ' to support the words like "it's".

但是,正则表达式中还可以包含其他一些符号 - 例如, ' 以支持诸如“it's”之类的词。

回答by abonn

This is pretty simplistic but does the job if you are typing words with spaces in between. It ends up counting numbers as well but I'm sure you could edit the code to not count numbers.

这非常简单,但如果您输入的单词之间有空格,就可以了。它最终也会计算数字,但我相信您可以编辑代码以不计算数字。

puts "enter a sentence to find its word length: "
word = gets
word = word.chomp
splits = word.split(" ")
target = splits.length.to_s


puts "your sentence is " + target + " words long"

回答by coderGuy

The best way to do is to use splitmethod. splitdivides a string into sub-strings based on a delimiter, returning an array of the sub-strings. splittakes two parameters, namely; patternand limit. patternis the delimiter over which the string is to be split into an array. limitspecifies the number of elements in the resulting array. For more details, refer to Ruby Documentation: Ruby String documentation

最好的方法是使用拆分方法。 split根据分隔符将字符串划分为子字符串,返回子字符串数组。 split需要两个参数,即;模式限制模式是将字符串拆分为数组的分隔符。 limit指定结果数组中的元素数。有关更多详细信息,请参阅 Ruby 文档:Ruby 字符串文档

str = "This is a string"
str.split(' ').size
#output: 4

The above code splits the string wherever it finds a spaceand hence it give the number of words in the string which is indirectly the size of the array.

上面的代码在找到空格的地方拆分字符串,因此它给出了字符串中的单词数,它间接地代表了数组的大小。

回答by Hillel

The above solution is wrong, consider the following:

上面的解决方法是错误的,考虑以下:

"one-way  street"

You will get

你会得到

["one-way","", "street"]

Use

'one-way street'.gsub(/[^-a-zA-Z]/, ' ').split.size

回答by Lri

This splits words only on ASCII whitespace chars:

这仅在 ASCII 空白字符上拆分单词:

p "  some word\nother\tword|word".strip.split(/\s+/).size #=> 4