string 如果字符串包含某些字符,则过滤和子集(在 R 中)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40032674/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 16:34:01  来源:igfitidea点击:

Filter & Subset if a String Contains Certain Characters (in R)

rstringfiltergrepsubset

提问by ask39

I currently wish to divide a data frame into subsets for training/testing. In the data frame there are columns that contain different items, and some contain sub-items like (Aisle01, Aisle02, etc.) I am getting tripped up by filtering out a partial string in multiple columns.

我目前希望将数据帧划分为用于训练/测试的子集。在数据框中,有些列包含不同的项目,有些列包含子项目,例如(Aisle01、Aisle02 等)。我通过过滤掉多列中的部分字符串而被绊倒了。

Data sample:

数据样本:

Column1   Column2  Column3

Wall01    Wall04   45.6
Wall04    Aisle02  65.7
Aisle06   Wall01   45.0
Aisle01   Wall01   33.3
Wall01    Wall04   21.1

If my data frame (x) contains two columns that within them contain multiple version of "Aisle", I wish to filter out everything from both columns that contains "Aisle". Wondering if the line below is somewhat on the right track?

如果我的数据框 (x) 包含两列,其中包含多个版本的“Aisle”,我希望从包含“Aisle”的两列中过滤掉所有内容。想知道下面的线路是否在正确的轨道上?

filter(x, column1 & column2 == grep(x$column1 & x$column2, "Aisle"))

filter(x, column1 & column2 == grep(x$column1 & x$column2, "Aisle"))

Desired result:

想要的结果:

Column1  Column2  Column3

Wall04   Aisle02  65.7
Aisle06  Wall01   45.0
Aisle01  Wall01   33.3

Thank you in advance.

先感谢您。

回答by Barker

The easiest solution I can see would be this:

我能看到的最简单的解决方案是:

x <- x[grepl("Aisle", x[["column1"]]) | grepl("Aisle", x[["column2"]]), ]

Using greplinstead of grepproduces a logical so you can use the |operation to select your rows. Also I just wanted to quickly go over a few places in your code that may be giving you trouble.

使用grepl而不是grep产生一个逻辑,因此您可以使用该|操作来选择您的行。另外,我只是想快速检查一下您代码中可能给您带来麻烦的几个地方。

  1. The x$column1 & x$column2in the beginning of your grepstatement means that the function will try to run the &operation pairwise on each of the entries in column1and column2. Since these are characters and not logicals, this will produce some weird results.

  2. In grepthe patternyou are trying to match comes before the string you are trying to match it to, so it should be grep("Aisle", columnValue)not the other way around. Running ?functionNamewill give you the information about the function so you don't have to try and figure that out from memory.

  3. filteris a function for time series (ts) objects, not data frames. I am surprised you didn't get an error by using it in this way.

  1. x$column1 & x$column2你的开头grep声明意味着该函数将尝试运行&在每个条目的操作成对的column1column2。由于这些是字符而不是逻辑,这会产生一些奇怪的结果。

  2. greppattern你试图匹配而来的字符串,然后试图匹配到,所以应该grep("Aisle", columnValue)不是周围的其他方式。运行?functionName将为您提供有关该函数的信息,因此您不必尝试从记忆中找出它。

  3. filter是时间序列 ( ts) 对象的函数,而不是数据帧。我很惊讶您以这种方式使用它没有出错。

Best of luck. Comment if you want anything clarified.

祝你好运。如果您想澄清任何事情,请发表评论。