string 计算 R 中的单词出现次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7782113/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 01:14:31  来源:igfitidea点击:

Count word occurrences in R

stringr

提问by LNA

Is there a function for counting the number of times a particular keyword is contained in a dataset?

是否有用于计算特定关键字包含在数据集中的次数的函数?

For example, if dataset <- c("corn", "cornmeal", "corn on the cob", "meal")the count would be 3.

例如,如果dataset <- c("corn", "cornmeal", "corn on the cob", "meal")计数为 3。

回答by IRTFM

Let's for the moment assume you wanted the number of element containing "corn":

让我们暂时假设您想要包含“corn”的元素数量:

length(grep("corn", dataset))
[1] 3

After you get the basics of R down better you may want to look at the "tm" package.

在您更好地了解 R 的基础知识后,您可能需要查看“tm”包。

EDIT: I realize that this time around you wanted any-"corn" but in the future you might want to get word-"corn". Over on r-help Bill Dunlap pointed out a more compact grep pattern for gathering whole words:

编辑:我意识到这一次你想要任何“玉米”,但将来你可能想要得到“玉米”这个词。在 r-help 上,Bill Dunlap 指出了一个更紧凑的 grep 模式来收集整个单词:

grep("\<corn\>", dataset)

回答by petermeissner

Another quite convenient and intuitiveway to do it is to use the str_countfunction of the stringrpackage:

另一种非常方便和直观的方法是使用包的str_count功能stringr

library(stringr)
dataset <- c("corn", "cornmeal", "corn on the cob", "meal")

# for mere occurences of the pattern:
str_count(dataset, "corn")
# [1] 1 1 1 0

# for occurences of the word alone:
str_count(dataset, "\bcorn\b")
# [1] 1 0 1 0

# summing it up
sum(str_count(dataset, "corn"))
# [1] 3

回答by Junaid

You can also do something like the following:

您还可以执行以下操作:

length(dataset[which(dataset=="corn")])

回答by Benbob

I'd just do it with string division like:

我只是用字符串除法来做,比如:

library(roperators)

dataset <- c("corn", "cornmeal", "corn on the cob", "meal")

# for each vector element:
dataset %s/% 'corn'

# for everything:
sum(dataset %s/% 'corn')