string R 使用“”将字符串转换为向量标记化

Question

提问by screechOwl

I have a string :

我有一个字符串：

string1 <- "This is my string"

I would like to convert it to a vector that looks like this:

我想将其转换为如下所示的向量：

vector1
"This"
"is"
"my"
"string"

How do I do this? I know I could use the tmpackage to convert to termDocumentMatrixand then convert to a matrix but it would alphabetize the words and I need them to stay in the same order.

我该怎么做呢？我知道我可以使用tm包转换为termDocumentMatrix矩阵，然后转换为矩阵，但它会将单词按字母顺序排列，我需要它们保持相同的顺序。

Answer 1

回答by Dason

You can use strsplit to accomplish this task.

您可以使用 strsplit 来完成此任务。

string1 <- "This is my string"
strsplit(string1, " ")[[1]]
#[1] "This"   "is"     "my"     "string"

Answer 2

回答by Sacha Epskamp

Slightly different from Dason, but this will split for any amount of white space including newlines:

与 Dason 略有不同，但这将拆分为任意数量的空格，包括换行符：

string1 <- "This   is my
string"
strsplit(string1, "\s+")[[1]]

Answer 3

回答by Shiqing Fan

As a supplement, we can also use unlist()to produce a vector from a given list structure:

作为补充，我们还可以使用unlist()从给定的列表结构生成向量：

string1 <- "This is my string"  # get a list structure
unlist(strsplit(string1, "\s+"))  # unlist the list
#[1] "This"   "is"     "my"     "string"

Answer 4

回答by Rich Scriven

If you're simply extracting words by splitting on the spaces, here are a couple of nice alternatives.

如果您只是通过拆分空格来提取单词，这里有几个不错的选择。

string1 <- "This is my string"

scan(text = string1, what = "")
# [1] "This"   "is"     "my"     "string"

library(stringi)
stri_split_fixed(string1, " ")[[1]]
# [1] "This"   "is"     "my"     "string"
stri_extract_all_words(string1, simplify = TRUE)
#      [,1]   [,2] [,3] [,4]    
# [1,] "This" "is" "my" "string"
stri_split_boundaries(string1, simplify = TRUE)
#      [,1]    [,2]  [,3]  [,4]    
# [1,] "This " "is " "my " "string"

Answer 5

回答by russellpierce

Try:

尝试：

library(tm)
library("RWeka")
library(RWekajars)
NGramTokenizer(source1, Weka_control(min = 1, max = 1))

It is an over engineered solution for your problem. strsplit using Sacha's approach is generally just fine.

这是针对您的问题的过度设计的解决方案。使用 Sacha 的方法 strsplit 通常就可以了。

string R 使用“”将字符串转换为向量标记化

提问by screechOwl

回答by Dason

回答by Sacha Epskamp

回答by Shiqing Fan

回答by Rich Scriven

回答by russellpierce

相关推荐

最近更新

标签

string R 使用“”将字符串转换为向量标记化

提问by screechOwl

回答by Dason

回答by Sacha Epskamp

回答by Shiqing Fan

回答by Rich Scriven

回答by russellpierce

相关推荐

string 如何在 .net framework 4 中将字符串转换为 base64

string 在 R 中使用字符串名称分配 data.frame 的列

string 用带有符号的 bash 分割字符串

string Arduino：字符串到 int 得到奇怪的值

相关推荐

最近更新

标签