list 要在 R 中列出的文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6602881/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 01:41:14  来源:igfitidea点击:

Text file to list in R

listrtextstatistics

提问by Stephen Turner

I have a large text file with a variable number of fields in each row. The first entry in each row corresponds to a biological pathway, and each subsequent entry corresponds to a gene in that pathway. The first few lines might look like this

我有一个大文本文件,每行中有可变数量的字段。每行中的第一个条目对应一个生物途径,随后的每个条目对应该途径中的一个基因。前几行可能看起来像这样

path1   gene1 gene2
path2   gene3 gene4 gene5 gene6
path3   gene7 gene8 gene9

I need to read this file into R as a list, with each element being a character vector, and the name of each element in the list being the first element on the line, for example:

我需要将此文件作为列表读入 R 中,每个元素都是一个字符向量,列表中每个元素的名称是该行的第一个元素,例如:

> pathways <- list(
+     path1=c("gene1","gene2"), 
+     path2=c("gene3","gene4","gene5","gene6"),
+     path3=c("gene7","gene8","gene9")
+ )
> 
> str(pathways)
List of 3
 $ path1: chr [1:2] "gene1" "gene2"
 $ path2: chr [1:4] "gene3" "gene4" "gene5" "gene6"
 $ path3: chr [1:3] "gene7" "gene8" "gene9"
> 
> str(pathways$path1)
 chr [1:2] "gene1" "gene2"
> 
> print(pathways)
$path1
[1] "gene1" "gene2"

$path2
[1] "gene3" "gene4" "gene5" "gene6"

$path3
[1] "gene7" "gene8" "gene9"

...but I need to do this automatically for thousands of lines. I saw a similar question posted here previously, but I couldn't figure out how to do this from that thread.

...但我需要为数千行自动执行此操作。我之前在这里看到过一个类似的问题,但我无法从该线程中弄清楚如何做到这一点。

Thanks in advance.

提前致谢。

回答by Joshua Ulrich

Here's one way to do it:

这是一种方法:

# Read in the data
x <- scan("data.txt", what="", sep="\n")
# Separate elements by one or more whitepace
y <- strsplit(x, "[[:space:]]+")
# Extract the first vector element and set it as the list element name
names(y) <- sapply(y, `[[`, 1)
#names(y) <- sapply(y, function(x) x[[1]]) # same as above
# Remove the first vector element from each list element
y <- lapply(y, `[`, -1)
#y <- lapply(y, function(x) x[-1]) # same as above

回答by Gavin Simpson

One solution is to read the data in via read.table(), but use the fill = TRUEargument to pad the rows with fewer "entries", convert the resulting data frame to a list and then clean up the "empty" elements.

一种解决方案是在 via 中读取数据read.table(),但使用fill = TRUE参数用较少的“条目”填充行,将结果数据框转换为列表,然后清理“空”元素。

First, read your snippet of data in:

首先,读取您的数据片段:

con <- textConnection("path1   gene1 gene2
path2   gene3 gene4 gene5 gene6
path3   gene7 gene8 gene9
")
dat <- read.table(con, fill = TRUE, stringsAsFactors = FALSE)
close(con)

Next we drop the first column, first saving it for the names of the list later

接下来我们删除第一列,首先将其保存为稍后列表的名称

nams <- dat[, 1]
dat <- dat[, -1]

Convert the data frame to a list. Here I just split the data frame on the indices 1,2,...,n where n is the number of rows:

将数据框转换为列表。在这里,我只是在索引 1,2,...,n 上拆分数据框,其中 n 是行数:

ldat <- split(dat, seq_len(nrow(dat)))

Clean up the empty cells:

清理空单元格:

ldat <- lapply(ldat, function(x) x[x != ""])

Finally, apply the names

最后,应用名称

names(ldat) <- nams

Giving:

给予:

> ldat
$path1
[1] "gene1" "gene2"

$path2
[1] "gene3" "gene4" "gene5" "gene6"

$path3
[1] "gene7" "gene8" "gene9"

回答by JAShapiro

A quick solution based on the linked page...

基于链接页面的快速解决方案...

inlist <- strsplit(readLines("file.txt"), "[[:space:]]+")
pathways <- lapply(inlist, tail, n = -1)
names(pathways) <- lapply(inlist, head, n = 1)

回答by Karsten W.

One more solution:

另一种解决方案:

sl <- c("path1 gene1 gene2", "path2 gene1 gene2 gene3") # created by readLines 
f <- function(l, s) {
  v <- strsplit(s, " ")[[1]]
  l[[v[1]]] <- v[2:length(v)]
  return(l)
}
res <- Reduce(f, sl, list())