string 选择列具有像 'hsa..' 这样的字符串的行(部分字符串匹配)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13043928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Selecting rows where a column has a string like 'hsa..' (partial string match)
提问by Asda
I have a 371MB text file containing micro RNA data. Essentially, I would like to only select those rows that have information about human microRNA.
我有一个包含 micro RNA 数据的 371MB 文本文件。本质上,我只想选择那些包含人类 microRNA 信息的行。
I have read in the file using a read.table. Usually, I'd accomplish what I'd want with sqldf - if it had a 'like' syntax (select * from <> where miRNA like 'hsa'). Unfortunately - sqldf does not support that syntax.
我已经使用 read.table 读入了文件。通常,我会用 sqldf 完成我想要的 - 如果它有一个“like”语法(select * from <> where miRNA like 'hsa')。不幸的是 - sqldf 不支持该语法。
How can I do this in R? I have looked around stackoverflow and do not see an example of how I can do a partial string match. I even installed the stringr package - but it does not quite have what I need.
我怎样才能在 R 中做到这一点?我环顾了 stackoverflow,没有看到如何进行部分字符串匹配的示例。我什至安装了 stringr 包 - 但它并没有我需要的东西。
What I would like to do, is something like this - where all rows where hsa-*are selected.
我想做的是这样的事情 -选择hsa- * 的所有行。
selectedRows <- conservedData[, conservedData$miRNA %like% "hsa-"]
which of course, is not correct syntax.
这当然不是正确的语法。
Can somebody please help me with this? Thanks a lot for reading.
有人可以帮我解决这个问题吗?非常感谢阅读。
Asda
阿斯达
回答by A5C1D2H2I1M1N2O1R2T1
I notice that you mention a function %like%
in your current approach. I don't know if that's a reference to the %like%
from "data.table", but if it is, you can definitely use it as follows.
我注意到您%like%
在当前方法中提到了一个函数。我不知道这是否是对%like%
来自“data.table”的引用,但如果是,您绝对可以按如下方式使用它。
Note that the object does not have to be a data.table
(but also remember that subsetting approaches for data.frame
s and data.table
s are not identical):
请注意,对象不必是 a data.table
(但也要记住data.frame
s 和data.table
s 的子集方法不相同):
library(data.table)
mtcars[rownames(mtcars) %like% "Merc", ]
iris[iris$Species %like% "osa", ]
If that is what you had, then perhaps you had just mixed up row and column positions for subsetting data.
如果这就是你所拥有的,那么也许你只是混淆了行和列位置来设置子集数据。
If you don't want to load a package, you can try using grep()
to search for the string you're matching. Here's an example with the mtcars
dataset, where we are matching all rows where the row names includes "Merc":
如果不想加载包,可以尝试使用grep()
搜索匹配的字符串。这是mtcars
数据集的示例,我们匹配行名称包含“Merc”的所有行:
mtcars[grep("Merc", rownames(mtcars)), ]
mpg cyl disp hp drat wt qsec vs am gear carb
# Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
# Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
# Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
# Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
# Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3
# Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3
And, another example, using the iris
dataset searching for the string osa
:
再举一个例子,使用iris
数据集搜索字符串osa
:
irisSubset <- iris[grep("osa", iris$Species), ]
head(irisSubset)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 4 4.6 3.1 1.5 0.2 setosa
# 5 5.0 3.6 1.4 0.2 setosa
# 6 5.4 3.9 1.7 0.4 setosa
For your problem try:
对于您的问题,请尝试:
selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ]
回答by Sam Firke
Try str_detect()
from the stringrpackage, which detects the presence or absence of a pattern in a string.
str_detect()
从stringr包中尝试,它检测字符串中模式的存在与否。
Here is an approach that also incorporates the %>%
pipe and filter()
from the dplyrpackage:
这是一种还包含%>%
管道和filter()
来自dplyr包的方法:
library(stringr)
library(dplyr)
CO2 %>%
filter(str_detect(Treatment, "non"))
Plant Type Treatment conc uptake
1 Qn1 Quebec nonchilled 95 16.0
2 Qn1 Quebec nonchilled 175 30.4
3 Qn1 Quebec nonchilled 250 34.8
4 Qn1 Quebec nonchilled 350 37.2
5 Qn1 Quebec nonchilled 500 35.3
...
This filters the sample CO2 data set (that comes with R) for rows where the Treatment variable contains the substring "non". You can adjust whether str_detect
finds fixed matches or uses a regex - see the documentation for the stringr package.
这将过滤示例 CO2 数据集(随 R 一起提供),用于处理变量包含子字符串“non”的行。您可以调整是str_detect
查找固定匹配项还是使用正则表达式 - 请参阅 stringr 包的文档。
回答by user1609452
LIKE
should work in sqlite:
LIKE
应该在sqlite中工作:
require(sqldf)
df <- data.frame(name = c('bob','robert','peter'),id=c(1,2,3))
sqldf("select * from df where name LIKE '%er%'")
name id
1 robert 2
2 peter 3