string 在 data.frame 中查找字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39450003/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find string in data.frame
提问by Jonas Lindel?v
How do I search for a string in a data.frame? As a minimal example, how do I find the locations (columns and rows) of 'horse' in this data.frame?
如何在 data.frame 中搜索字符串?作为一个最小的例子,我如何在这个 data.frame 中找到“马”的位置(列和行)?
> df = data.frame(animal=c('goat','horse','horse','two', 'five'), level=c('five','one','three',30,'horse'), length=c(10, 20, 30, 'horse', 'eight'))
> df
animal level length
1 goat five 10
2 horse one 20
3 horse three 30
4 two 30 horse
5 five horse eight
... so row 4 and 5 have the wrong order. Any output that would allow me to identify that 'horse' has shifted to the level
column in row 5 and to the length
column in row 4 is good. Maybe:
...所以第 4 行和第 5 行的顺序错误。任何可以让我识别“马”已转移到level
第 5 行的length
列和第 4 行的列的输出都很好。也许:
> magic_function(df, 'horse')
col row
'animal', 2
'animal', 3
'length', 4
'level', 5
Here's what I want to use this for: I have a very large data frame (around 60 columns, 20.000 rows) in which some columns are messed up for some rows. It's too large to eyeball in order to identify the different ways that order can be wrong, so searching would be nice. I will use this info to move data to the correct columns for these rows.
这是我想用它来做的:我有一个非常大的数据框(大约 60 列,20.000 行),其中一些列对于一些行来说是混乱的。为了识别顺序可能出错的不同方式,它太大而无法观察,因此搜索会很好。我将使用此信息将数据移动到这些行的正确列。
回答by thothal
What about:
关于什么:
which(df == "horse", arr.ind = TRUE)
# row col
# [1,] 2 1
# [2,] 3 1
# [3,] 5 2
# [4,] 4 3
回答by 989
Another way around:
另一种方法:
l <- sapply(colnames(df), function(x) grep("horse", df[,x]))
$animal
[1] 2 3
$level
[1] 5
$length
[1] 4
If you want the output to be matrix:
如果您希望输出为矩阵:
sapply(l,'[',1:max(lengths(l)))
animal level length
[1,] 2 5 4
[2,] 3 NA NA
回答by piyuw
Another way to do it is the following:
另一种方法是:
library(data.table)
library(zoo)
library(dplyr)
library(timeDate)
library(reshape2)
data frame name = tbl_account
first,Transpose it :
首先,转置它:
temp = t(tbl_Account)
Then, put it in to a list :
然后,将其放入列表:
temp = list(temp)
This essentially puts every single observation in a data frame in to one massive string, allowing you to search the whole data frame in one go.
这基本上将数据框中的每个观察结果放入一个大字符串中,让您可以一次性搜索整个数据框。
then do the searching :
然后进行搜索:
temp[[1]][grep("Horse",temp[[1]])] #brings back the actual value occurrences
grep("Horse", temp[[1]]) # brings back the position of the element in a list it occurs in
hope this helps :)
希望这可以帮助 :)
回答by Ronak Shah
We can get the indices where the value is equal to horse
. Divide it by number of rows (nrow
) to get the column indices and by columns (ncol
) to get the row indices.
我们可以得到值等于 的索引horse
。将其除以行数 ( nrow
) 以获取列索引并除以列 ( ncol
) 以获取行索引。
We use colnames
to get column names instead of indices.
我们colnames
用来获取列名而不是索引。
data.frame(col = colnames(df)[floor(which(df == "horse") / (nrow(df) + 1)) + 1],
row = floor(which(df == "horse") / ncol(df)) + 1)
# col row
#1 animal 1
#2 animal 2
#3 level 4
#4 length 5