Ruby:从字符串中提取单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7622369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 02:05:14  来源:igfitidea点击:

Ruby: Extracting Words From String

ruby-on-railsrubyregexparsing

提问by sybohy

I'm trying to parse words out of a string and put them into an array. I've tried the following thing:

我正在尝试从字符串中解析单词并将它们放入数组中。我试过以下事情:

@string1 = "oriented design, decomposition, encapsulation, and testing. Uses "
puts @string1.scan(/\s([^\,\.\s]*)/)

It seems to do the trick, but it's a bit shaky (I should include more special characters for example). Is there a better way to do so in ruby?

它似乎可以解决问题,但它有点不稳定(例如,我应该包含更多特殊字符)。在 ruby​​ 中有更好的方法吗?

Optional: I have a cs course description. I intend to extract all the words out of it and place them in a string array, remove the most common word in the English language from the array produced, and then use the rest of the words as tags that users can use to search for cs courses.

可选:我有一个 cs 课程描述。我打算从中提取所有单词并将它们放在一个字符串数组中,从生成的数组中删除英语中最常见的单词,然后将其余单词用作用户可以用来搜索 cs 的标签培训班。

回答by David Nehme

The split command.

拆分命令。

   words = @string1.split(/\W+/)

will split the string into an array based on a regular expression. \W means any "non-word" character and the "+" means to combine multiple delimiters.

将根据正则表达式将字符串拆分为数组。\W 表示任何“非单词”字符,“+”表示组合多个分隔符。

回答by lazzy.developer

For me the best to spliting sentences is:

对我来说,最好的拆分句子是:

line.split(/[^[[:word:]]]+/)

Even with multilingual words and punctuation marks work perfectly:

即使使用多语言单词和标点符号也能完美运行:

line = 'English words, Polski ?urek!!! crème fra?che...'
line.split(/[^[[:word:]]]+/)
=> ["English", "words", "Polski", "?urek", "crème", "fra?che"] 

回答by BF4

Well, you could split the string on spaces if that's your delimiter of interest

好吧,如果这是您感兴趣的分隔符,您可以在空格上拆分字符串

@string1.split(' ')

Or split on word boundaries

或在单词边界上拆分

\W  # Any non-word character

\b  # Any word boundary character

Or on non-words

或非词

\s  # Any whitespace character

Hint: try testing each of these on http://rubular.com

提示:尝试在http://rubular.com上测试每一个

And note that ruby 1.9 has some differences from 1.8

请注意,ruby 1.9 与 1.8 有一些不同

回答by ayckoster

For Rails you can use something like this:

对于 Rails,您可以使用以下内容:

@string1.split(/\s/).delete_if(&:blank?)