读取文件 python 中的上一行

Question

提问by Lim H.

I need to get the value of the previous line in a file and compare it with the current line as I'm iterating through the file. The file is HUGE so I can't read it whole or randomly accessing a line number with linecachebecause the library function still reads the whole file into memory anyway.

我需要获取文件中前一行的值，并在我遍历文件时将其与当前行进行比较。该文件很大，因此我无法读取整个文件或随机访问行号，linecache因为库函数仍然将整个文件读入内存。

EDITI'm so sorry I forgot the mention that I have to read the file backwardly.

编辑我很抱歉我忘了提到我必须向后阅读文件。

EDIT2

编辑2

I have tried the following:

我尝试了以下方法：

 f = open("filename", "r")
 for line in reversed(f.readlines()): # this doesn't work because there are too many lines to read into memory

 line = linecache.getline("filename", num_line) # this also doesn't work due to the same problem above.

Answer 1

采纳答案by Stephan

Just save the previous when you iterate to the next

迭代到下一个时只需保存上一个

prevLine = ""
for line in file:
    # do some work here
    prevLine = line

This will store the previous line in prevLinewhile you are looping

这将在prevLine您循环时存储上一行

editapparently OP needs to read this file backwards:

编辑显然 OP 需要向后读取此文件：

aaand after like an hour of research I failed multiple times to do it within memory constraints

aaand 经过一个小时的研究，我在内存限制内多次失败

Hereyou go Lim, that guy knows what he's doing, here is his best Idea:

在这里你去林，那家伙知道自己在做什么，这里是他最好的想法：

General approach #2: Read the entire file, store position of lines
With this approach, you also read through the entire file once, but instead of storing the entire file (all the text) in memory, you only store the binary positions inside the file where each line started. You can store these positions in a similar data structure as the one storing the lines in the first approach.
Whever you want to read line X, you have to re-read the line from the file, starting at the position you stored for the start of that line.
Pros: Almost as easy to implement as the first approach Cons: can take a while to read large files

一般方法#2：读取整个文件，存储行的位置
使用这种方法，您还可以通读整个文件一次，但不是将整个文件（所有文本）存储在内存中，而是仅将二进制位置存储在文件中每一行开始的位置。您可以将这些位置存储在与第一种方法中存储行的数据结构类似的数据结构中。
无论您想读取第 X 行，都必须从文件中重新读取该行，从您存储的该行开头的位置开始。
优点：几乎和第一种方法一样容易实现缺点：读取大文件可能需要一段时间

Answer 2

回答by mgilson

I'd write a simple generator for the task:

我会为这个任务编写一个简单的生成器：

def pairwise(fname):
    with open(fname) as fin:
        prev = next(fin)
        for line in fin:
            yield prev,line
            prev = line

Or, you can use the pairwiserecipe from itertools:

或者，您可以使用以下pairwise配方itertools：

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.izip(a, b)

Answer 3

回答by Diana

@Lim, here's how I would write it (reply to the comments)

@Lim，这是我的写作方式（回复评论）

def do_stuff_with_two_lines(previous_line, current_line):
    print "--------------"
    print previous_line
    print current_line

my_file = open('my_file.txt', 'r')

if my_file:
    current_line = my_file.readline()

for line in my_file:

    previous_line = current_line
    current_line = line

    do_stuff_with_two_lines(previous_line, current_line)

读取文件 python 中的上一行

提问by Lim H.

采纳答案by Stephan

回答by mgilson

回答by Diana

相关推荐

最近更新

标签

读取文件 python 中的上一行

提问by Lim H.

采纳答案by Stephan

回答by mgilson

回答by Diana

相关推荐

Python 从字符串中删除非数字字符

python OpenCV中mp4视频的编解码器是什么

如何使用 Python 搜索字典值是否包含某个字符串

Python 如何使 Matplotlib 散点图作为一个组透明？

相关推荐

最近更新

标签