string 如何将字符串拆分为给定长度的子字符串？

Question

提问by MadSeb

I have a string such as:

我有一个字符串，例如：

"aabbccccdd"

I want to break this string into a vector of substrings of length 2 :

我想将此字符串分解为长度为 2 的子字符串的向量：

"aa" "bb" "cc" "cc" "dd"

Answer 1

回答by GSee

Here is one way

这是一种方法

substring("aabbccccdd", seq(1, 9, 2), seq(2, 10, 2))
#[1] "aa" "bb" "cc" "cc" "dd"

or more generally

或更一般地

text <- "aabbccccdd"
substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
#[1] "aa" "bb" "cc" "cc" "dd"

Edit: This is much, much faster

编辑：这要快得多

sst <- strsplit(text, "")[[1]]
out <- paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)])

It first splits the string into characters. Then, it pastes together the even elements and the odd elements.

它首先将字符串拆分为字符。然后，它将偶数元素和奇数元素粘贴在一起。

Timings

时间安排

text <- paste(rep(paste0(letters, letters), 1000), collapse="")
g1 <- function(text) {
    substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
}
g2 <- function(text) {
    sst <- strsplit(text, "")[[1]]
    paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)])
}
identical(g1(text), g2(text))
#[1] TRUE
library(rbenchmark)
benchmark(g1=g1(text), g2=g2(text))
#  test replications elapsed relative user.self sys.self user.child sys.child
#1   g1          100  95.451 79.87531    95.438        0          0         0
#2   g2          100   1.195  1.00000     1.196        0          0         0

Answer 2

回答by Sven Hohenstein

There are two easy possibilities:

有两种简单的可能性：

s <- "aabbccccdd"

gregexprand regmatches:

regmatches(s, gregexpr(".{2}", s))[[1]]
# [1] "aa" "bb" "cc" "cc" "dd"

strsplit:

strsplit(s, "(?<=.{2})", perl = TRUE)[[1]]
# [1] "aa" "bb" "cc" "cc" "dd"

gregexpr和regmatches：

regmatches(s, gregexpr(".{2}", s))[[1]]
# [1] "aa" "bb" "cc" "cc" "dd"

strsplit：

strsplit(s, "(?<=.{2})", perl = TRUE)[[1]]
# [1] "aa" "bb" "cc" "cc" "dd"

Answer 3

回答by mindless.panda

string <- "aabbccccdd"
# total length of string
num.chars <- nchar(string)

# the indices where each substr will start
starts <- seq(1,num.chars, by=2)

# chop it up
sapply(starts, function(ii) {
  substr(string, ii, ii+1)
})

Which gives

这使

[1] "aa" "bb" "cc" "cc" "dd"

Answer 4

回答by Matthew Lundberg

One can use a matrix to group the characters:

可以使用矩阵对字符进行分组：

s2 <- function(x) {
  m <- matrix(strsplit(x, '')[[1]], nrow=2)
  apply(m, 2, paste, collapse='')
}

s2('aabbccddeeff')
## [1] "aa" "bb" "cc" "dd" "ee" "ff"

Unfortunately, this breaks for an input of odd string length, giving a warning:

不幸的是，对于奇数字符串长度的输入，这会中断，并发出警告：

s2('abc')
## [1] "ab" "ca"
## Warning message:
## In matrix(strsplit(x, "")[[1]], nrow = 2) :
##   data length [3] is not a sub-multiple or multiple of the number of rows [2]

More unfortunate is that g1and g2from @GSee silently return incorrect results for an input of odd string length:

更不幸的是，g1与g2从@GSee不返回不正确的结果为奇数串长度的输入端：

g1('abc')
## [1] "ab"

g2('abc')
## [1] "ab" "cb"

Here is function in the spirit of s2, taking a parameter for the number of characters in each group, and leaves the last entry short if necessary:

这是本着 s2 精神的函数，采用每个组中字符数的参数，并在必要时保留最后一个条目：

s <- function(x, n) {
  sst <- strsplit(x, '')[[1]]
  m <- matrix('', nrow=n, ncol=(length(sst)+n-1)%/%n)
  m[seq_along(sst)] <- sst
  apply(m, 2, paste, collapse='')
}

s('hello world', 2)
## [1] "he" "ll" "o " "wo" "rl" "d" 
s('hello world', 3)
## [1] "hel" "lo " "wor" "ld"

(It is indeed slower than g2, but faster than g1by about a factor of 7)

（它确实比慢g2，但比快g1约 7 倍）

Answer 5

回答by den2042

Ugly but works

丑但有效

sequenceString <- "ATGAATAAAG"

J=3#maximum sequence length in file
sequenceSmallVecStart <-
  substring(sequenceString, seq(1, nchar(sequenceString)-J+1, J), 
    seq(J,nchar(sequenceString), J))
sequenceSmallVecEnd <-
    substring(sequenceString, max(seq(J, nchar(sequenceString), J))+1)
sequenceSmallVec <-
    c(sequenceSmallVecStart,sequenceSmallVecEnd)
cat(sequenceSmallVec,sep = "\n")

Gives ATG AAT AAA G

给予 ATG AAT AAA G

string 如何将字符串拆分为给定长度的子字符串？

提问by MadSeb

回答by GSee

回答by Sven Hohenstein

回答by mindless.panda

回答by Matthew Lundberg

回答by den2042

相关推荐

最近更新

标签

string 如何将字符串拆分为给定长度的子字符串？

提问by MadSeb

回答by GSee

回答by Sven Hohenstein

回答by mindless.panda

回答by Matthew Lundberg

回答by den2042

相关推荐

string 来自第一个 indexof 子字符串的 Shell 脚本子字符串

string 从字符串路径获取文件名？

string 使用 Case 匹配 sql server 中的字符串？

string 我可以在 Razor 中使用 @helper 语法返回一个字符串吗？

相关推荐

最近更新

标签