string 从R中的字符串中提取唯一数字

Question

提问by Remi.b

I have a list of strings which contain random characters such as:

我有一个包含随机字符的字符串列表，例如：

list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"

I'd like to know which numbers are present at least once (unique()) in this list. The solution of my example is:

我想知道unique()在此列表中哪些数字至少出现了一次 ( )。我的例子的解决方案是：

solution: c(7,667,11,5,2)

解决方案： c(7,667,11,5,2)

If someone has a method that does not consider 11 as "eleven" but as "one and one", it would also be useful. The solution in this condition would be:

如果有人有一种不将 11 视为“十一”而是“一加一”的方法，它也会很有用。在这种情况下的解决方案是：

solution: c(7,6,1,5,2)

解决方案： c(7,6,1,5,2)

(I found this post on a related subject: Extracting numbers from vectors of strings)

（我在相关主题上找到了这篇文章：从字符串向量中提取数字）

Answer 1

回答by Arun

For the second answer, you can use gsubto remove everything from the string that's not a number, then split the string as follows:

对于第二个答案，您可以使用gsub从字符串中删除不是数字的所有内容，然后按如下方式拆分字符串：

unique(as.numeric(unlist(strsplit(gsub("[^0-9]", "", unlist(ll)), ""))))
# [1] 7 6 1 5 2

For the first answer, similarly using strsplit,

对于第一个答案，同样使用strsplit，

unique(na.omit(as.numeric(unlist(strsplit(unlist(ll), "[^0-9]+")))))
# [1]   7 667  11   5   2

PS: don't name your variable list(as there's an inbuilt function list). I've named your data as ll.

PS：不要命名你的变量list（因为有一个内置函数list）。我已将您的数据命名为ll.

Answer 2

回答by A5C1D2H2I1M1N2O1R2T1

Here is yet another answer, this one using gregexprto find the numbers, and regmatchesto extract them:

这是另一个答案，这个答案gregexpr用于查找数字并regmatches提取它们：

l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")

temp1 <- gregexpr("[0-9]", l)   # Individual digits
temp2 <- gregexpr("[0-9]+", l)  # Numbers with any number of digits

as.numeric(unique(unlist(regmatches(l, temp1))))
# [1] 7 6 1 5 2
as.numeric(unique(unlist(regmatches(l, temp2))))
# [1]   7 667  11   5   2

Answer 3

回答by altabq

A solution using stringi

使用stringi的解决方案

 # extract the numbers:

 nums <- stri_extract_all_regex(list, "[0-9]+")

 # Make vector and get unique numbers:

 nums <- unlist(nums)
 nums <- unique(nums)

And that's your first solution

这是你的第一个解决方案

For the second solution I would use substr:

对于第二个解决方案，我将使用substr：

nums_first <- sapply(nums, function(x) unique(substr(x,1,1)))

Answer 4

回答by sgibb

You could use ?strsplit(like suggested in @Arun's answer in Extracting numbers from vectors (of strings)):

您可以使用?strsplit（就像@Arun 在从向量中提取数字（字符串）中的答案中所建议的那样）：

l <- c("djud7+dg[a]hs667", "7fd*hac11(5)", "2tu,g7gka5")

## split string at non-digits
s <- strsplit(l, "[^[:digit:]]")

## convert strings to numeric ("" become NA)
solution <- as.numeric(unlist(s))

## remove NA and duplicates
solution <- unique(solution[!is.na(solution)])
# [1]   7 667  11   5   2

Answer 5

回答by Joe

A stringrsolution with str_match_alland piped operators. For the first solution:

阿stringr与溶液str_match_all和管道运营商。对于第一个解决方案：

library(stringr)
str_match_all(ll, "[0-9]+") %>% unlist %>% unique %>% as.numeric

Second solution:

第二种解决方案：

str_match_all(ll, "[0-9]") %>% unlist %>% unique %>% as.numeric

(Note: I've also called the list ll)

（注意：我也调用了列表ll）

Answer 6

回答by asb

Use strsplit using pattern as the inverse of numeric digits: 0-9

使用 strsplit 使用模式作为数字的倒数：0-9

For the example you have provided, do this:

对于您提供的示例，请执行以下操作：

tmp <- sapply(list, function (k) strsplit(k, "[^0-9]"))

Then simply take a union of all `sets' in the list, like so:

然后简单地取列表中所有“集合”的并集，如下所示：

tmp <- Reduce(union, tmp)

Then you only have to remove the empty string.

然后你只需要删除空字符串。

Answer 7

回答by Rory Nolan

Check out the str_extract_numbers()function from the strexpackage.

查看包中的str_extract_numbers()功能strex。

pacman::p_load(strex)
list=list()
list[1] = "djud7+dg[a]hs667"
list[2] = "7fd*hac11(5)"
list[3] = "2tu,g7gka5"
charvec <- unlist(list)
print(charvec)
#> [1] "djud7+dg[a]hs667" "7fd*hac11(5)"     "2tu,g7gka5"
str_extract_numbers(charvec)
#> [[1]]
#> [1]   7 667
#> 
#> [[2]]
#> [1]  7 11  5
#> 
#> [[3]]
#> [1] 2 7 5
unique(unlist(str_extract_numbers(charvec)))
#> [1]   7 667  11   5   2

Created on 2018-09-03 by the reprex package(v0.2.0).

由reprex 包(v0.2.0)于 2018 年 9 月 3 日创建。

string 从R中的字符串中提取唯一数字

提问by Remi.b

回答by Arun

回答by A5C1D2H2I1M1N2O1R2T1

回答by altabq

A solution using stringi

使用stringi的解决方案

回答by sgibb

回答by Joe

回答by asb

回答by Rory Nolan

相关推荐

最近更新

标签

string 从R中的字符串中提取唯一数字

提问by Remi.b

回答by Arun

回答by A5C1D2H2I1M1N2O1R2T1

回答by altabq

A solution using stringi

使用stringi的解决方案

回答by sgibb

回答by Joe

回答by asb

回答by Rory Nolan

相关推荐

string 生成随机字符串

string 提取字符串的前（或后）n 个字符

string 将字符串的一部分提取到bash中的变量

string 在 MIPS/Assembler 中将 ASCII 数字字符串转换为 int

相关推荐

最近更新

标签