如何使用 Python 将文本文件读入列表或数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14676265/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read a text file into a list or an array with Python
提问by user2037744
I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.
我正在尝试将文本文件的行读入 python 中的列表或数组。我只需要能够在创建后单独访问列表或数组中的任何项目。
The text file is formatted as follows:
文本文件的格式如下:
0,0,200,0,53,1,0,255,...,0.
Where the ...is above, there actual text file has hundreds or thousands more items.
在...上面的地方,实际的文本文件有数百或数千个项目。
I'm using the following code to try to read the file into a list:
我正在使用以下代码尝试将文件读入列表:
text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()
The output I get is:
我得到的输出是:
['0,0,200,0,53,1,0,255,...,0.']
1
Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?
显然,它正在将整个文件读入只有一个项目的列表,而不是单个项目的列表。我究竟做错了什么?
回答by Achrome
You will have to split your string into a list of values using split()
您必须使用将字符串拆分为值列表 split()
So,
所以,
lines = text_file.read().split(',')
回答by Thiru
You can also use numpy loadtxt like
您还可以使用 numpy loadtxt 之类的
from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)
回答by gboffi
So you want to create a list of lists... We need to start with an empty list
所以你想创建一个列表列表......我们需要从一个空列表开始
list_of_lists = []
next, we read the file content, line by line
接下来,我们逐行读取文件内容
with open('data') as f:
for line in f:
inner_list = [elt.strip() for elt in line.split(',')]
# in alternative, if you need to use the file content as numbers
# inner_list = [int(elt.strip()) for elt in line.split(',')]
list_of_lists.append(inner_list)
A common use case is that of columnar data, but our units of storage are the rows of the file, that we have read one by one, so you may want to transposeyour list of lists. This can be done with the following idiom
一个常见的用例是列数据,但我们的存储单位是文件的行,我们已经逐行读取,因此您可能需要转置列表列表。这可以通过以下习语来完成
by_cols = zip(*list_of_lists)
Another common use is to give a name to each column
另一个常见用途是为每一列命名
col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
by_names[col_name] = by_cols[i]
so that you can operate on homogeneous data items
以便您可以对同类数据项进行操作
mean_apple_prices = [money/fruits for money, fruits in
zip(by_names['apples revenue'], by_names['apples_sold'])]
Most of what I've written can be speeded up using the csvmodule, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).
我写的大部分内容都可以使用csv标准库中的模块来加速。另一个第三方模块是pandas,它可以让您自动化典型数据分析的大多数方面(但有许多依赖项)。
UpdateWhile in Python 2 zip(*list_of_lists)returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists)returns a zip objectthat is not subscriptable.
更新虽然在 Python 2 中zip(*list_of_lists)返回一个不同的(转置)列表列表,但在 Python 3 中情况已经改变并zip(*list_of_lists)返回一个不可下标的zip 对象。
If you needindexed access you can use
如果您需要索引访问,您可以使用
by_cols = list(zip(*list_of_lists))
that gives you a list of lists in both versions of Python.
这为您提供了两个版本的 Python 中的列表列表。
On the other hand, if you don't needindexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine...
另一方面,如果您不需要索引访问并且您想要的只是构建一个按列名索引的字典,那么 zip 对象就可以了……
file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column
回答by Blairg23
This question is asking how to read the comma-separated value contents from a file into an iterable list:
这个问题是问如何将文件中的逗号分隔值内容读取到可迭代列表中:
0,0,200,0,53,1,0,255,...,0.
0,0,200,0,53,1,0,255,...,0.
The easiest way to do this is with the csvmodule as follows:
最简单的方法是使用csv模块,如下所示:
import csv
with open('filename.dat', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
Now, you can easily iterate over spamreaderlike this:
现在,您可以spamreader像这样轻松地迭代:
for row in spamreader:
print(', '.join(row))
See documentationfor more examples.
有关更多示例,请参阅文档。

