string 在空白处拆分字符串向量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1676990/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:33:26  来源:igfitidea点击:

Split a string vector at whitespace

rstringvectorsplitwhitespace

提问by Zak

I have the following vector:

我有以下向量:

tmp3 <- c("1500 2", "1500 1", "1510 2", "1510 1", "1520 2", "1520 1", "1530 2", 
"1530 1", "1540 2", "1540 1")

I would like to just retain the second number in each of the atoms of this vector, so it would read:

我只想保留这个向量的每个原子中的第二个数字,所以它会读作:

c(2,1,2,1,2,1,2,1,2,1)

回答by Shane

There's probably a better way, but here are two approaches with strsplit():

可能有更好的方法,但这里有两种方法strsplit()

as.numeric(data.frame(strsplit(tmp3, " "))[2,])
as.numeric(lapply(strsplit(tmp3," "), function(x) x[2]))

The as.numeric() may not be necessary if you can use characters...

如果您可以使用字符,则可能不需要 as.numeric()...

回答by Marek

One could use read.tableon textConnection:

人们可以使用read.tabletextConnection

X <- read.table(textConnection(tmp3))

then

然后

> str(X)
'data.frame':   10 obs. of  2 variables:
 $ V1: int  1500 1500 1510 1510 1520 1520 1530 1530 1540 1540
 $ V2: int  2 1 2 1 2 1 2 1 2 1

so X$V2is what you need.

X$V2就是您所需要的。

回答by SchaunW

It depends a little bit on how closely your actual data matches the example data you've given. I you're just trying to get everything after the space, you can use gsub:

这在一定程度上取决于您的实际数据与您提供的示例数据的匹配程度。我只是想在空间之后获得所有内容,您可以使用gsub

gsub(".+\s+", "", tmp3)
[1] "2" "1" "2" "1" "2" "1" "2" "1" "2" "1"

If you're trying to implement a rule more complicated than "take everything after the space", you'll need a more complicated regular expresion.

如果你想实现一个比“把所有东西都放在空格后面”更复杂的规则,你需要一个更复杂的正则表达式。

回答by ephpostfacto

What I think is the most elegant way to do this

我认为这是最优雅的方式来做到这一点

>     res <- sapply(strsplit(tmp3, " "), "[[", 2)

If you need it to be an integer

如果你需要它是一个整数

>     storage.mode(res) <- "integer"

回答by Matt Parker

substr(x = tmp3, start = 6, stop = 6)

So long as your strings are always the same length, this should do the trick.

只要你的字符串总是相同的长度,这应该可以解决问题。

(And, of course, you don't have to specify the argument names - substr(tmp3, 6, 6)works fine, too)

(当然,您不必指定参数名称 - 也substr(tmp3, 6, 6)可以正常工作)

回答by Paolo

This should do it:

这应该这样做:

library(plyr)
ldply(strsplit(tmp3, split = " "))[[2]]

If you need a numeric vector, use

如果您需要数字向量,请使用

as.numeric(ldply(strsplit(tmp3, split = " "))[[2]])

回答by Rich Scriven

Another option is scan(). To get the second value, we can use a logical subset.

另一种选择是scan()。要获得第二个值,我们可以使用逻辑子集。

scan(text = tmp3)[c(FALSE, TRUE)]
#  [1] 2 1 2 1 2 1 2 1 2 1

回答by Valentin

Just to add two more options - using stringr::str_split()or data.table::tstrsplit()

只是添加两个选项 - 使用stringr::str_split()data.table::tstrsplit()

1) using stringr::str_split()

1) 使用 stringr::str_split()

# data posted above by the asker
tmp3 <- c("1500 2", "1500 1", "1510 2", "1510 1", "1520 2", "1520 1", "1530 2", 
          "1530 1", "1540 2", "1540 1")

library(stringr)

as.integer(
  str_split(string = tmp3, 
            pattern = "[[:space:]]", 
            simplify = TRUE)[, 2] 
)
#>  [1] 2 1 2 1 2 1 2 1 2 1

simplify = TRUEtells str_splitto return a matrix, then we can index the matrix for the desired column, therefore, the [, 2]part

simplify = TRUE告诉str_split返回一个矩阵,然后我们可以索引所需列的矩阵,因此,[, 2]部分

2) Using data.table::tstrsplit()

2) 使用 data.table::tstrsplit()

library(data.table)

as.data.table(tmp3)[, tstrsplit(tmp3, split = "[[:space:]]", type.convert = TRUE)][, V2]
#>  [1] 2 1 2 1 2 1 2 1 2 1

type.convert = TRUEis responsible for the conversion to integer here, but use this with care for other datasets. The indexing [, V2]part has a similar reason as explained above for [, 2]. Here it selects the second column of the returned data table object, which contains the values desired by the asker as integers.

type.convert = TRUE负责在此处转换为整数,但请谨慎使用其他数据集。索引[, V2]部分的原因与上面对 的解释类似[, 2]。在这里它选择返回的数据表对象的第二列,其中包含询问者所需的整数值。

sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.0  magrittr_1.5    tools_4.0.0     htmltools_0.4.0
#>  [5] yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6   rmarkdown_2.1  
#>  [9] highr_0.8       knitr_1.28      stringr_1.4.0   xfun_0.13      
#> [13] digest_0.6.25   rlang_0.4.6     evaluate_0.14

Created on 2020-05-06 by the reprex package(v0.3.0)

reprex 包(v0.3.0)于 2020 年 5 月 6 日创建

回答by greenbooks

An easier way to split 1 column into 2 columns via data.table

通过 data.table 将 1 列拆分为 2 列的更简单方法

require(data.table)  
data_ex = data.table( a = paste( sample(1:3, size=10, replace=TRUE),"-separate", sep="" ))  
data_ex[, number:=  unlist( strsplit(x=a, split="-") )[[1]], by=a]  
data_ex[, word:= unlist( strsplit(x=a, split="-") )[[2]], by=a ]