在python中将文本文件解析为列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18304835/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:26:40  来源:igfitidea点击:

Parsing a text file into a list in python

pythonstringlistlines

提问by Rachel Rose

I'm completely new to Python, and I'm trying to read in a txt file that contains a combination of words and numbers. I can read in the txt file just fine, but I'm struggling to get the string into a format I can work with.

我对 Python 完全陌生,我正在尝试读取包含单词和数字组合的 txt 文件。我可以很好地读取 txt 文件,但我正在努力将字符串转换为我可以使用的格式。

import matplotlib.pyplot as plt
import numpy as np
from numpy import loadtxt

f= open("/Users/Jennifer/Desktop/test.txt", "r")

lines=f.readlines()

Data = []

list=lines[3]
i=4
while i<12:
        list=list.append(line[i])
        i=i+1

print list

f.close()

I want a list that contains all the elements in lines 3-12 (starting from 0), which is all numbers. When I do print lines[1], I get the data from that line. When I do print lines, or print lines[3:12], I get each character preceded by \x00. For example, the word "Plate" becomes: ['\x00P\x00l\x00a\x00t\x00e. Using lines = [line.strip() for line in f] gets the same result. When I try to put individual lines together in the while loop above, I get the error "AttributeError: 'str' object has no attribute 'append'."

我想要一个包含第 3-12 行(从 0 开始)中的所有元素的列表,这些元素都是数字。当我打印行 [1] 时,我从该行获取数据。当我打印行或打印行 [3:12] 时,每个字符都以 \x00 开头。例如,单词“Plate”变为:['\x00P\x00l\x00a\x00t\x00e。使用 lines = [line.strip() for line in f] 得到相同的结果。当我尝试在上面的 while 循环中将各个行放在一起时,出现错误“AttributeError: 'str' object has no attribute 'append'。”

How can I get a selection of lines from a txt file into a list? Thank you so much!!!

如何从 txt 文件中选择行到列表中?非常感谢!!!

Edit: The txt file looks like this:

编辑:txt 文件如下所示:

BLOCKS= 1 Plate: Phosphate Noisiness Assay 2000x 1.3 PlateFormat Endpoint Absorbance Raw FALSE 1 1 650 1 12 96 1 8
Temperature(?C) 1 2 3 4 5 6 7 8 9 10 11 12
21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227
0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197
0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451
0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235
0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191
0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273
0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038
0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062

BLOCKS = 1板:磷酸盐噪度测定2000X 1.3 PlateFormat端点吸光度原始FALSE 1 1 650 1 12 96 1 8
温度1 2 3 4 5 6 7 8 9 10 11 12(C)
21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227
0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197
0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451
0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235
0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191
0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273
0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038
0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062

~End Original Filename: 2013-08-06 Phosphate Noisiness; Date Last Saved: 8/6/2013 7:00:55 PM

~完 原文件名:2013-08-06 磷酸盐噪音;上次保存日期:2013 年 8 月 6 日下午 7:00:55

UpdateI used this code:

更新我使用了这个代码:

f= open("/Users/Jennifer/Desktop/test.txt", "r")
file_list = f.readlines()

first_twelve = file_list[3:11]

data = [x.replace('\t',' ') for x in first_twelve]
data = [x.replace('\x00','') for x in data]
data = [x.replace(' \r\n','') for x in data]

print data

to get this result: [' 21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227 ', ' 0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197 ', ' 0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451 ', ' 0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235 ', ' 0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191 ', ' 0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273 ', ' 0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038 ', ' 0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062 ']

要得到这样的结果: '21.4 0.4977 0.5074 0.5183 0.5128 0.5021 0.5114 0.4993 0.5308 0.4837 0.5286 0.5231 0.5227', '0.488 0.4742 0.5011 0.4868 0.4976 0.4845 0.4848 0.5179 0.4772 0.5363 0.5109 0.5197',“0.4882 0.4913 0.4941 0.5188 0.4766 0.4914 0.495 0.5172 0.4826 0.5039 0.504 0.5451 ' '0.4771 0.4875 0.523 0.4851 0.4757 0.4767 0.4918 0.5212 0.4742 0.5153 0.5027 0.5235', '0.4474 0.4841 0.5193 0.4755 0.4649 0.4883 0.5165 0.5223 0.4799 0.5269 0.5091 0.5191', '0.4721 0.4794 0.501 0.4467 0.4785 0.4792 0.4894 0.511 0.4778 0.5223 0.4888 0.5273',' 0.4122 0.4454 0.314 0.2747 0.4621 0.4416 0.3716 0.2534 0.4497 0.5778 0.2319 0.1038 '' 0.4479 0.5368 0.3046 0.3115 0.4745 0.5116 0.3689 0.3915 0.4803 0.5209 0.1981 0.1062“]

Which is (correct me if I'm wrong, very new to Python!) a list of lists, which I should be able to work with. Thank you so much to everyone who responded!!!

这是(如果我错了,请纠正我,对 Python 非常陌生!)我应该能够使用的列表列表。非常感谢所有回复的人!!!

回答by dawg

You have the line list=lines[3]in your source code.

list=lines[3]的源代码中有该行。

Two issues here.

这里有两个问题。

  1. Don't use listas a variable name. You silently overwrote the built-in list constructor when you did that.
  2. When you take one item from a list lines[3]now you only have that object -- in this case a string. When you try to append to it you can't -- it isn't a list.
  1. 不要list用作变量名。当你这样做时,你默默地覆盖了内置的列表构造函数。
  2. 当您从列表中取出一项时,您lines[3]现在只有那个对象——在本例中是一个字符串。当您尝试附加到它时,您不能 - 它不是一个列表。

You can demonstrate your bug easily in the console:

您可以在控制台中轻松演示您的错误:

>>> li=['1']
>>> li.append('2')
>>> li
['1', '2']
>>> st='1'
>>> st.append('2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'append'

Other comments, in general, on your code.

一般来说,关于您的代码的其他评论。

Assume you have a text file called '/tmp/test/txt' that contains this text:

假设您有一个名为“/tmp/test/txt”的文本文件,其中包含以下文本:

Line 1
Line 2
...
Line 19

Reading the contents of that file is a simple as this:

读取该文件的内容很简单,如下所示:

with open('/tmp/test.txt', 'r') as fin:
    lines=fin.readlines()

If you want a subset of the lines, you can use a slice:

如果你想要这些行的一个子集,你可以使用一个切片:

subset=lines[3:12]

If you want to process each line for something, like strip the carriage return, use the file object as an iterator:

如果你想为某事处理每一行,比如去掉回车,使用文件对象作为迭代器:

with open('/tmp/test.txt', 'r') as fin:
    lines=[]
    for line in fin:
        lines.append(line.strip()) 

For your specific problem of having NULs in the data, perhaps you are reading a binary file masquerading as text? You need to post an example of the file.

对于您在数据中包含 NUL 的特定问题,也许您正在读取一个伪装成文本的二进制文件?您需要发布该文件的示例。

Edit

编辑

Your file contains Unicode characters. (right after 'Temperature') which may be some of the odd characters you are seeing. If you are only interested in the lines with numbers, you can ignore them.

您的文件包含 Unicode 字符。(紧跟在“温度”之后)这可能是您看到的一些奇怪字符。如果您只对带有数字的行感兴趣,则可以忽略它们。

You do not YET have a list of lists, but it easy to get:

您还没有列表列表,但很容易获得:

data=[]                               # will hold the lines of the file
with open(ur_file,'rU') as fin:       
    for line in fin:                  # for each line of the file
        line=line.strip()             # remove CR/LF
        if line:                      # skip blank lines
            data.append(line)

print data                            # list of STRINGS separated by spaces
matrix=[map(float,line.split()) for line in data[3:10]]  # convert the strings..
print matrix                          # NOW you have a list of list of floats...

回答by chapter3

The tweak below might help you to get rid of the \00 character embedded in your data

下面的调整可能会帮助您摆脱数据中嵌入的 \00 字符

f = open("/Users/Jennifer/Desktop/test.text", "r")

lines = f.readlines()
lines = [x.replace('\x00','') for x in lines]

for i in range(3,12):
    l = []
    l.append(lines[i])

I am not sure if your data has other delimiters (say comma or space) to separate the numbers. If so, a simple split will help to convert the line into a list:

我不确定您的数据是否有其他分隔符(比如逗号或空格)来分隔数字。如果是这样,一个简单的拆分将有助于将行转换为列表:

line = '123.00,456.00,789.00'

l = line.split(',')  # list will become ['123.00','456.00','789.00']

Edit

编辑

Continue from Rachel's updated code:

从 Rachel 的更新代码继续:

f= open("/Users/Jennifer/Desktop/test.txt", "r")
file_list = f.readlines()

first_twelve = file_list[3:11]

data = [x.replace('\t',' ') for x in first_twelve]
data = [x.replace('\x00','') for x in data]
data = [x.replace(' \r\n','') for x in data]

items = []
for dataline in data:
    items += dataline.split(' ')
items = [float(x) for x in items if len(x) > 0]  # remove dummy items left in the list

print items

回答by Peter Foti

When you write the code lines = f.readlines()a list of lines is being return to you. When you then say lines[3], you're getting the 3rd line. Thats why you're ending up with individual characters.

当您编写代码时lines = f.readlines(),将返回给您的行列表。当你说 时lines[3],你得到了第三行。这就是为什么你最终会得到单个角色。

All you need to do is say

你需要做的就是说

files = open("Your File.txt")

file_list =  files.readlines()

first_twelve = file_list[0:12] #returns a list with the first 12 lines

Once you've got the first_twelvearray you can do whatever you want with it.

一旦你得到了first_twelve数组,你就可以用它做任何你想做的事情。

To print each line you would do:

要打印每一行,您将执行以下操作:

for each_line in first_twelve:
    print each_line

That should work for you.

那应该对你有用。

回答by AnupamChugh

Using readLines()is memory-inefficient. It takes the whole file into memory. Instead, do this:

使用readLines()内存效率低下。它将整个文件放入内存。相反,请执行以下操作:

[i.split() for i in open('filename.txt')]