string 在 data.frame 中查找字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39450003/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 16:33:48  来源:igfitidea点击:

Find string in data.frame

rstringdataframe

提问by Jonas Lindel?v

How do I search for a string in a data.frame? As a minimal example, how do I find the locations (columns and rows) of 'horse' in this data.frame?

如何在 data.frame 中搜索字符串?作为一个最小的例子,我如何在这个 data.frame 中找到“马”的位置(列和行)?

> df = data.frame(animal=c('goat','horse','horse','two', 'five'), level=c('five','one','three',30,'horse'), length=c(10, 20, 30, 'horse', 'eight'))
> df
  animal level length
1   goat  five     10
2  horse   one     20
3  horse three     30
4    two    30  horse
5   five horse  eight

... so row 4 and 5 have the wrong order. Any output that would allow me to identify that 'horse' has shifted to the levelcolumn in row 5 and to the lengthcolumn in row 4 is good. Maybe:

...所以第 4 行和第 5 行的顺序错误。任何可以让我识别“马”已转移到level第 5 行的length列和第 4 行的列的输出都很好。也许:

> magic_function(df, 'horse')
col       row
'animal', 2
'animal', 3
'length', 4
'level',  5

Here's what I want to use this for: I have a very large data frame (around 60 columns, 20.000 rows) in which some columns are messed up for some rows. It's too large to eyeball in order to identify the different ways that order can be wrong, so searching would be nice. I will use this info to move data to the correct columns for these rows.

这是我想用它来做的:我有一个非常大的数据框(大约 60 列,20.000 行),其中一些列对于一些行来说是混乱的。为了识别顺序可能出错的不同方式,它太大而无法观察,因此搜索会很好。我将使用此信息将数据移动到这些行的正确列。

回答by thothal

What about:

关于什么:

which(df == "horse", arr.ind = TRUE)
#      row col
# [1,]   2   1
# [2,]   3   1
# [3,]   5   2
# [4,]   4   3

回答by 989

Another way around:

另一种方法:

l <- sapply(colnames(df), function(x) grep("horse", df[,x]))

$animal
[1] 2 3

$level
[1] 5

$length
[1] 4

If you want the output to be matrix:

如果您希望输出为矩阵:

sapply(l,'[',1:max(lengths(l)))

     animal level length
[1,]      2     5      4
[2,]      3    NA     NA

回答by piyuw

Another way to do it is the following:

另一种方法是:

library(data.table)
library(zoo)
library(dplyr)
library(timeDate)
library(reshape2)
data frame name = tbl_account

first,Transpose it :

首先,转置它:

temp = t(tbl_Account)

Then, put it in to a list :

然后,将其放入列表:

temp = list(temp)

This essentially puts every single observation in a data frame in to one massive string, allowing you to search the whole data frame in one go.

这基本上将数据框中的每个观察结果放入一个大字符串中,让您可以一次性搜索整个数据框。

then do the searching :

然后进行搜索:

temp[[1]][grep("Horse",temp[[1]])] #brings back the actual value occurrences
grep("Horse", temp[[1]]) # brings back the position of the element in a list it occurs in 

hope this helps :)

希望这可以帮助 :)

回答by Ronak Shah

We can get the indices where the value is equal to horse. Divide it by number of rows (nrow) to get the column indices and by columns (ncol) to get the row indices.

我们可以得到值等于 的索引horse。将其除以行数 ( nrow) 以获取列索引并除以列 ( ncol) 以获取行索引。

We use colnamesto get column names instead of indices.

我们colnames用来获取列名而不是索引。

data.frame(col = colnames(df)[floor(which(df == "horse") / (nrow(df) + 1)) + 1], 
           row = floor(which(df == "horse") / ncol(df)) + 1)

#   col   row
#1 animal   1
#2 animal   2
#3  level   4
#4 length   5