string 在 R 中提取混合数字和字符的字符串的数字部分

Question

提问by user288609

I have a lot of strings, and each of which tends to have the following format: Ab_Cd-001234.txtI want to replace it with 001234. How can I achieve it in R?

我有很多字符串，每个字符串都有以下格式：Ab_Cd-001234.txt我想用001234. 我怎样才能在 R 中实现它？

Answer 1

采纳答案by agstudy

Using gsubor subyou can do this :

使用gsub或者sub你可以这样做：

 gsub('.*-([0-9]+).*','\1','Ab_Cd-001234.txt')
"001234"

you can use regexprwith regmatches

你可以用regexpr与regmatches

m <- gregexpr('[0-9]+','Ab_Cd-001234.txt')
regmatches('Ab_Cd-001234.txt',m)
"001234"

EDITthe 2 methods are vectorized and works for a vector of strings.

编辑这 2 种方法是矢量化的，适用于字符串向量。

x <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
sub('.*-([0-9]+).*','\1',x)
"001234" "001234"

 m <- gregexpr('[0-9]+',x)
> regmatches(x,m)
[[1]]
[1] "001234"

[[2]]
[1] "001234"

Answer 2

回答by Ben

The stringrpackage has lots of handy shortcuts for this kind of work:

该stringr包有很多这种工作的方便快捷方式：

# input data following @agstudy
data <-  c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')

# load library
library(stringr)

# prepare regular expression
regexp <- "[[:digit:]]+"

# process string
str_extract(data, regexp)

Which gives the desired result:

  [1] "001234" "001234"

To explain the regexp a little:

稍微解释一下正则表达式：

[[:digit:]]is any number 0 to 9

[[:digit:]]是 0 到 9 之间的任意数字

+means the preceding item (in this case, a digit) will be matched one or more times

+表示前一项（在本例中为数字）将匹配一次或多次

This page is also very useful for this kind of string processing: http://en.wikibooks.org/wiki/R_Programming/Text_Processing

此页面对于此类字符串处理也非常有用：http: //en.wikibooks.org/wiki/R_Programming/Text_Processing

Answer 3

回答by Tyler Rinker

You could use genXtractfrom the qdap package. This takes a left character string and a right character string and extracts the elements between.

您可以genXtract从 qdap 包中使用。这需要一个左字符串和一个右字符串并提取它们之间的元素。

library(qdap)
genXtract("Ab_Cd-001234.txt", "-", ".txt")

Though I much prefer agstudy's answer.

虽然我更喜欢agstudy的答案。

EDITExtending answer to match agstudy's:

编辑扩展答案以匹配 agstudy 的：

x <- c('Ab_Cd-001234.txt','Ab_Cd-001234.txt')
genXtract(x, "-", ".txt")

# $`-  :  .txt1`
# [1] "001234"
# 
# $`-  :  .txt2`
# [1] "001234"

Answer 4

回答by G. Grothendieck

gsubRemove prefix and suffix:

gsub删除前缀和后缀：

gsub(".*-|\.txt$", "", x)

tools packageUse file_path_sans_extfrom tools to remove extension and then use subto remove prefix:

工具包使用file_path_sans_extfrom tools 删除扩展名，然后使用sub删除前缀：

library(tools)
sub(".*-", "", file_path_sans_ext(x))

strapplycExtract the digits after - and before dot. See gsubfn home pagefor more info:

Strapplyc提取点之后和之前的数字。有关更多信息，请参阅gsubfn 主页：

library(gsubfn)
strapplyc(x, "-(\d+)\.", simplify = TRUE)

Note that if it were desired to return a numeric we could use strapplyrather than strapplyclike this:

请注意，如果需要返回一个数字，我们可以使用strapply而不是strapplyc这样：

strapply(x, "-(\d+)\.", as.numeric, simplify = TRUE)

string 在 R 中提取混合数字和字符的字符串的数字部分

提问by user288609

采纳答案by agstudy

回答by Ben

回答by Tyler Rinker

回答by G. Grothendieck

相关推荐

最近更新

标签

string 在 R 中提取混合数字和字符的字符串的数字部分

提问by user288609

采纳答案by agstudy

回答by Ben

回答by Tyler Rinker

回答by G. Grothendieck

相关推荐

string 用字符串列表中的空格替换特殊字符

string Perl 中的字符串比较“eq”与“==”

string 如何比较两个字符串组件

string 我只想在 vb.net 上的文本框中接受信件

相关推荐

最近更新

标签