string 如果字符串包含某些字符,则过滤和子集(在 R 中)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40032674/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filter & Subset if a String Contains Certain Characters (in R)
提问by ask39
I currently wish to divide a data frame into subsets for training/testing. In the data frame there are columns that contain different items, and some contain sub-items like (Aisle01, Aisle02, etc.) I am getting tripped up by filtering out a partial string in multiple columns.
我目前希望将数据帧划分为用于训练/测试的子集。在数据框中,有些列包含不同的项目,有些列包含子项目,例如(Aisle01、Aisle02 等)。我通过过滤掉多列中的部分字符串而被绊倒了。
Data sample:
数据样本:
Column1 Column2 Column3
Wall01 Wall04 45.6
Wall04 Aisle02 65.7
Aisle06 Wall01 45.0
Aisle01 Wall01 33.3
Wall01 Wall04 21.1
If my data frame (x) contains two columns that within them contain multiple version of "Aisle", I wish to filter out everything from both columns that contains "Aisle". Wondering if the line below is somewhat on the right track?
如果我的数据框 (x) 包含两列,其中包含多个版本的“Aisle”,我希望从包含“Aisle”的两列中过滤掉所有内容。想知道下面的线路是否在正确的轨道上?
filter(x, column1 & column2 == grep(x$column1 & x$column2, "Aisle"))
filter(x, column1 & column2 == grep(x$column1 & x$column2, "Aisle"))
Desired result:
想要的结果:
Column1 Column2 Column3
Wall04 Aisle02 65.7
Aisle06 Wall01 45.0
Aisle01 Wall01 33.3
Thank you in advance.
先感谢您。
回答by Barker
The easiest solution I can see would be this:
我能看到的最简单的解决方案是:
x <- x[grepl("Aisle", x[["column1"]]) | grepl("Aisle", x[["column2"]]), ]
Using grepl
instead of grep
produces a logical so you can use the |
operation to select your rows. Also I just wanted to quickly go over a few places in your code that may be giving you trouble.
使用grepl
而不是grep
产生一个逻辑,因此您可以使用该|
操作来选择您的行。另外,我只是想快速检查一下您代码中可能给您带来麻烦的几个地方。
The
x$column1 & x$column2
in the beginning of yourgrep
statement means that the function will try to run the&
operation pairwise on each of the entries incolumn1
andcolumn2
. Since these are characters and not logicals, this will produce some weird results.In
grep
thepattern
you are trying to match comes before the string you are trying to match it to, so it should begrep("Aisle", columnValue)
not the other way around. Running?functionName
will give you the information about the function so you don't have to try and figure that out from memory.filter
is a function for time series (ts
) objects, not data frames. I am surprised you didn't get an error by using it in this way.
在
x$column1 & x$column2
你的开头grep
声明意味着该函数将尝试运行&
在每个条目的操作成对的column1
和column2
。由于这些是字符而不是逻辑,这会产生一些奇怪的结果。在
grep
对pattern
你试图匹配而来的字符串,然后试图匹配到,所以应该grep("Aisle", columnValue)
不是周围的其他方式。运行?functionName
将为您提供有关该函数的信息,因此您不必尝试从记忆中找出它。filter
是时间序列 (ts
) 对象的函数,而不是数据帧。我很惊讶您以这种方式使用它没有出错。
Best of luck. Comment if you want anything clarified.
祝你好运。如果您想澄清任何事情,请发表评论。