string 计算 R 中的单词出现次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7782113/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count word occurrences in R
提问by LNA
Is there a function for counting the number of times a particular keyword is contained in a dataset?
是否有用于计算特定关键字包含在数据集中的次数的函数?
For example, if dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
the count would be 3.
例如,如果dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
计数为 3。
回答by IRTFM
Let's for the moment assume you wanted the number of element containing "corn":
让我们暂时假设您想要包含“corn”的元素数量:
length(grep("corn", dataset))
[1] 3
After you get the basics of R down better you may want to look at the "tm" package.
在您更好地了解 R 的基础知识后,您可能需要查看“tm”包。
EDIT: I realize that this time around you wanted any-"corn" but in the future you might want to get word-"corn". Over on r-help Bill Dunlap pointed out a more compact grep pattern for gathering whole words:
编辑:我意识到这一次你想要任何“玉米”,但将来你可能想要得到“玉米”这个词。在 r-help 上,Bill Dunlap 指出了一个更紧凑的 grep 模式来收集整个单词:
grep("\<corn\>", dataset)
回答by petermeissner
Another quite convenient and intuitiveway to do it is to use the str_count
function of the stringr
package:
另一种非常方便和直观的方法是使用包的str_count
功能stringr
:
library(stringr)
dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
# for mere occurences of the pattern:
str_count(dataset, "corn")
# [1] 1 1 1 0
# for occurences of the word alone:
str_count(dataset, "\bcorn\b")
# [1] 1 0 1 0
# summing it up
sum(str_count(dataset, "corn"))
# [1] 3
回答by Junaid
You can also do something like the following:
您还可以执行以下操作:
length(dataset[which(dataset=="corn")])
回答by Benbob
I'd just do it with string division like:
我只是用字符串除法来做,比如:
library(roperators)
dataset <- c("corn", "cornmeal", "corn on the cob", "meal")
# for each vector element:
dataset %s/% 'corn'
# for everything:
sum(dataset %s/% 'corn')