将大型制表符分隔的 .txt 文件导入 Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16989647/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:08:56  来源:igfitidea点击:

Importing large tab-delimited .txt file into Python

pythonarrayslistcsvtab-delimited

提问by user2464402

I have a tab delimited .txt file that I'm trying to import into a matrix array in Python of the same format as the text file is as shown below:

我有一个制表符分隔的 .txt 文件,我试图将其导入 Python 中与文本文件格式相同的矩阵数组,如下所示:

123088 266 248 244 266 244 277

123088 266 248 244 266 244 277

123425 275 244 241 289 248 231

123425 275 244 241 289 248 231

123540 156 654 189 354 156 987

123540 156 654 189 354 156 987

Note there are many, many more rows of the stuff above (roughly 200) that I want to pass into Python and maintain the same formatting when creating a matrix array from it.

请注意,上面有很多很多行(大约 200 行),我想传递给 Python 并在从中创建矩阵数组时保持相同的格式。

The current code that I have for this is:

我为此拥有的当前代码是:

d = {}
with open('file name', 'rb') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='\t')
    for row in csv_reader:
        d[row[0]] = row[1:]

Which it slightly does what I need it to do, but not my target goal for it. I want to finish code that I can type in print(d[0,3]) and it will spit out 248.

它稍微做了我需要它做的事情,但不是我的目标。我想完成我可以在 print(d[0,3]) 中输入的代码,它会吐出 248。

回答by jsucsy

Try this:

尝试这个:

d = []
with open(sourcefile,'rb') as source:
    for line in source:
        fields = line.split('\t')
        d.append(fields)

print d[0][1]will print 266.

print d[0][1]将打印 266。

print d[0][2](remember your arrays are 0-based) will print 248.

print d[0][2](记住你的数组是基于 0 的)将打印 248。

To output the data in the same format as your input:

要以与输入相同的格式输出数据:

for line in d:
    print "\t".join(line)

回答by Jeff Tratner

First, you are loading it into a dictionary, which is not going to get the list of lists that you want.

首先,您将它加载到字典中,它不会获得您想要的列表列表。

It's dead simple to use the CSV module to generate a list of lists like this:

使用 CSV 模块生成如下列表非常简单:

import csv
with open(path) as f:
    reader = csv.reader(f, delimiter="\t")
    d = list(reader)
print d[0][2] # 248

That would give you a list of lists of strings, so if you wanted to get numbers, you'd have to convert to int.

这会给你一个字符串列表的列表,所以如果你想得到数字,你必须转换为 int。

That said, if you have a large array (or are doing any kind of numeric calculations), you should consider using something like NumPyor pandas. If you wanted to use NumPy, you could do

也就是说,如果你有一个大数组(或者正在做任何类型的数字计算),你应该考虑使用像NumPypandas这样的东西。如果你想使用 NumPy,你可以这样做

import numpy as np
d = np.loadtxt(path, delimiter="\t")
print d[0,2] # 248

As a bonus, NumPy arrays allow you to do quick vector/matrix operations. (Also, note that d[0][2]would work with the NumPy array too).

作为奖励,NumPy 数组允许您进行快速的向量/矩阵运算。(另外,请注意这d[0][2]也适用于 NumPy 数组)。

回答by K Butler

Not sure how to make print(d[0,3])output 248, but this will make print(d[0][3])output 248. First StackOverflow answer so IDK how show that the last two lines in my code block are actually just one long line.

不知道如何print(d[0,3])输出 248,但这将使print(d[0][3])输出 248。首先 StackOverflow 回答 IDK 如何显示我的代码块中的最后两行实际上只是一个长行。

import csv

Text_Input = r"<.txt file>"  
listoflists= []

with open(Text_Input) as txtfile:
    reader = csv.reader(txtfile)

    for row in reader:
        listoflists.append([int(row[0].split()[i]) for i in 
        range(len(row[0].split()))])