string 从R中的字符串中提取唯一数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17009628/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extracting unique numbers from string in R
提问by Remi.b
I have a list of strings which contain random characters such as:
我有一个包含随机字符的字符串列表,例如:
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
I'd like to know which numbers are present at least once (unique()
) in this list. The solution of my example is:
我想知道unique()
在此列表中哪些数字至少出现了一次 ( )。我的例子的解决方案是:
solution: c(7,667,11,5,2)
解决方案: c(7,667,11,5,2)
If someone has a method that does not consider 11 as "eleven" but as "one and one", it would also be useful. The solution in this condition would be:
如果有人有一种不将 11 视为“十一”而是“一加一”的方法,它也会很有用。在这种情况下的解决方案是:
solution: c(7,6,1,5,2)
解决方案: c(7,6,1,5,2)
(I found this post on a related subject: Extracting numbers from vectors of strings)
(我在相关主题上找到了这篇文章:从字符串向量中提取数字)
回答by Arun
For the second answer, you can use gsub
to remove everything from the string that's not a number, then split the string as follows:
对于第二个答案,您可以使用gsub
从字符串中删除不是数字的所有内容,然后按如下方式拆分字符串:
unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
# [1] 7 6 1 5 2
For the first answer, similarly using strsplit
,
对于第一个答案,同样使用strsplit
,
unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
# [1] 7 667 11 5 2
PS: don't name your variable list
(as there's an inbuilt function list
). I've named your data as ll
.
PS:不要命名你的变量list
(因为有一个内置函数list
)。我已将您的数据命名为ll
.
回答by A5C1D2H2I1M1N2O1R2T1
Here is yet another answer, this one using gregexpr
to find the numbers, and regmatches
to extract them:
这是另一个答案,这个答案gregexpr
用于查找数字并regmatches
提取它们:
l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")
temp1 <- gregexpr("[0-9]", l) # Individual digits
temp2 <- gregexpr("[0-9]+", l) # Numbers with any number of digits
as.numeric(unique(unlist(regmatches(l, temp1))))
# [1] 7 6 1 5 2
as.numeric(unique(unlist(regmatches(l, temp2))))
# [1] 7 667 11 5 2
回答by altabq
A solution using stringi
使用stringi的解决方案
# extract the numbers:
nums <- stri_extract_all_regex(list, "[0-9]+")
# Make vector and get unique numbers:
nums <- unlist(nums)
nums <- unique(nums)
And that's your first solution
这是你的第一个解决方案
For the second solution I would use substr
:
对于第二个解决方案,我将使用substr
:
nums_first <- sapply(nums, function(x) unique(substr(x,1,1)))
回答by sgibb
You could use ?strsplit
(like suggested in @Arun's answer in Extracting numbers from vectors (of strings)):
您可以使用?strsplit
(就像@Arun 在从向量中提取数字(字符串)中的答案中所建议的那样):
l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")
## split string at non-digits
s <- strsplit(l, "[^[:digit:]]")
## convert strings to numeric ("" become NA)
solution <- as.numeric(unlist(s))
## remove NA and duplicates
solution <- unique(solution[!is.na(solution)])
# [1] 7 667 11 5 2
回答by Joe
A stringr
solution with str_match_all
and piped operators. For the first solution:
阿stringr
与溶液str_match_all
和管道运营商。对于第一个解决方案:
library(stringr)
str_match_all(ll, "[0-9]+") %>% unlist %>% unique %>% as.numeric
Second solution:
第二种解决方案:
str_match_all(ll, "[0-9]") %>% unlist %>% unique %>% as.numeric
(Note: I've also called the list ll
)
(注意:我也调用了列表ll
)
回答by asb
Use strsplit using pattern as the inverse of numeric digits: 0-9
使用 strsplit 使用模式作为数字的倒数:0-9
For the example you have provided, do this:
对于您提供的示例,请执行以下操作:
tmp <- sapply(list, function (k) strsplit(k, "[^0-9]"))
Then simply take a union of all `sets' in the list, like so:
然后简单地取列表中所有“集合”的并集,如下所示:
tmp <- Reduce(union, tmp)
Then you only have to remove the empty string.
然后你只需要删除空字符串。
回答by Rory Nolan
Check out the str_extract_numbers()
function from the strex
package.
查看包中的str_extract_numbers()
功能strex
。
pacman::p_load(strex)
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
charvec <- unlist(list)
print(charvec)
#> [1] "djud7+dg[a]hs667" "7fd*hac11(5)" "2tu,g7gka5"
str_extract_numbers(charvec)
#> [[1]]
#> [1] 7 667
#>
#> [[2]]
#> [1] 7 11 5
#>
#> [[3]]
#> [1] 2 7 5
unique(unlist(str_extract_numbers(charvec)))
#> [1] 7 667 11 5 2
Created on 2018-09-03 by the reprex package(v0.2.0).
由reprex 包(v0.2.0)于 2018 年 9 月 3 日创建。