Python 循环遍历文本文件读取数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17436709/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python loop through a text file reading data
提问by jealopez
I am new to python, and although I am sure this might be a trivial question, I have spent my day trying to solve this in different ways. I have a file containing data that looks like this:
我是 python 的新手,虽然我确信这可能是一个微不足道的问题,但我花了一天的时间试图以不同的方式解决这个问题。我有一个包含如下数据的文件:
<string>
<integer>
<N1>
<N2>
data
data
...
<string>
<integer>
<N3>
<N4>
data
data
...
And that extends a number of times... I need to read the "data" which for the first set (between the first and second ) contains a number N1 of X points, a number N2 of Y points and a number N1*N2 of Z points. If I had only one set of data I already know how to read all the data, then read the value N1, N2, then slice it into X, Y and Z, reshape it and use it... but if my file contains more than one sets of data, how do I read only from one string until the next one, and then repeat the same operation for the next set, and again until I reach the end of the file? I tried defining a function like:
这扩展了很多次......我需要读取第一组(第一组和第二组之间)的“数据”包含X点的数量N1,Y点的数量N2和数量N1 * N2 Z 点。如果我只有一组数据,我已经知道如何读取所有数据,然后读取值 N1、N2,然后将其切成 X、Y 和 Z,对其进行整形并使用它...但如果我的文件包含更多比一组数据,我如何只从一个字符串读取到下一个,然后对下一组重复相同的操作,直到到达文件末尾?我尝试定义一个函数,如:
def dat_fun():
with open("inpfile.txt", "r") as ifile:
for line in ifile:
if isinstance('line', str) or (not line):
break
for line in ifile:
yield line
but is not working, I get arrays with no data on them. Any comments will be appreciated. Thanks!
但不起作用,我得到了没有数据的数组。任何意见将不胜感激。谢谢!
回答by Martijn Pieters
Alllines are instances of str
, so you break out on the first line. Remove that test, and test for an empty line by stripping away whitespace first:
所有行都是 的实例str
,因此您在第一行中断。删除该测试,并通过首先去除空格来测试空行:
def dat_fun():
with open("inpfile.txt", "r") as ifile:
for line in ifile:
if not line.strip():
break
yield line
I don't think you need to break at an empty line, really; the for
loop ends on its own at the end of the file.
我认为您不需要在空行处中断,真的;将for
在文件的结尾自身循环结束。
If your lines contain other sorts of data, you'd need to do the conversion yourself, coming fromstring.
如果您的行包含其他类型的数据,您需要自己进行转换,来自字符串。
回答by Rushy Panchal
def dat_fun():
with open("inpfile.txt", "r") as ifile:
for line in ifile:
if isinstance('line', str) or (not line): # 'line' is always a str, and so is the line itself
break
for line in ifile:
yield line
Change this to:
将此更改为:
def dat_fun():
with open("inpfile.txt", "r") as ifile:
for line in ifile:
if not line:
break
yield line
回答by Rob Watts
With structured data like this, I'd suggest just reading what you need. For example:
对于这样的结构化数据,我建议您只阅读您需要的内容。例如:
with open("inpfile.txt", "r") as ifile:
first_string = ifile.readline().strip() # Is this the name of the data set?
first_integer = int(ifile.readline()) # You haven't told us what this is, either
n_one = int(ifile.readline())
n_two = int(ifile.readline())
x_vals = []
y_vals = []
z_vals = []
for index in range(n_one):
x_vals.append(ifile.readline().strip())
for index in range(n_two):
y_vals.append(ifile.readline().strip())
for index in range(n_one*n_two):
z_vals.append(ifile.readline().strip())
You can turn this into a dataset generating function by adding a loop and yielding the values:
您可以通过添加循环并生成值将其转换为数据集生成函数:
with open("inpfile.txt", "r") as ifile:
while True:
first_string = ifile.readline().strip() # Is this the name of the data set?
if first_string == '':
break
first_integer = int(ifile.readline()) # You haven't told us what this is, either
n_one = int(ifile.readline())
n_two = int(ifile.readline())
x_vals = []
y_vals = []
z_vals = []
for index in range(n_one):
x_vals.append(ifile.readline().strip())
for index in range(n_two):
y_vals.append(ifile.readline().strip())
for index in range(n_one*n_two):
z_vals.append(ifile.readline().strip())
yield (x_vals, y_vals, z_vals) # and the first string and integer if you need those