python Python在读取时截断行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/525272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:17:23  来源:igfitidea点击:

Python truncate lines as they are read

pythonfile-io

提问by Ryan White

I have an application that reads lines from a file and runs its magic on each line as it is read. Once the line is read and properly processed, I would like to delete the line from the file. A backup of the removed line is already being kept. I would like to do something like

我有一个应用程序,它从文件中读取行并在读取时在每一行上运行它的魔法。一旦读取并正确处理该行,我想从文件中删除该行。已删除行的备份已保留。我想做类似的事情

file = open('myfile.txt', 'rw+')
for line in file:
   processLine(line)
   file.truncate(line)

This seems like a simple problem, but I would like to do it right rather than a whole lot of complicated seek() and tell() calls.

这似乎是一个简单的问题,但我想把它做对,而不是一大堆复杂的 seek() 和 tell() 调用。

Maybe all I really want to do is remove a particular line from a file.

也许我真正想做的就是从文件中删除特定行。

After spending far to long on this problem I decided that everyone was probably right and this it just not a good way to do things. It just seemed so elegant solution. What I was looking for was something akin to a FIFO that would just let me pop lines out of a file.

在这个问题上花了很长时间之后,我决定每个人都可能是对的,这不是做事的好方法。它看起来是如此优雅的解决方案。我正在寻找类似于 FIFO 的东西,它可以让我从文件中弹出行。

回答by jfs

Remove all lines after you've done with them:

完成后删除所有行:

with open('myfile.txt', 'r+') as file:
    for line in file:
        processLine(line)
    file.truncate(0)

Remove each line independently:

独立删除每一行:

lines = open('myfile.txt').readlines()

for line in lines[::-1]: # process lines in reverse order
    processLine(line)
    del lines[-1]  # remove the [last] line

open('myfile.txt', 'w').writelines(lines)

You can leave only those lines that cause exceptions:

您只能保留导致异常的那些行:

import fileinput

for line in fileinput.input(['myfile.txt'], inplace=1):
    try: processLine(line)
    except Exception:
         sys.stdout.write(line) # it prints to 'myfile.txt'

In general, as other people already said it is a bad idea what you are trying to do.

一般来说,正如其他人已经说过的那样,您尝试做什么是一个坏主意。

回答by nosklo

You can't. It is just not possible with actual text file implementations on current filesystems.

你不能。当前文件系统上的实际文本文件实现是不可能的。

Text files are sequential, because the lines in a text file can be of any length. Deleting a particular line would mean rewriting the entire file from that point on.

文本文件是连续的,因为文本文件中的行可以是任意长度。删除特定行意味着从那时起重写整个文件。

Suppose you have a file with the following 3 lines;

假设您有一个包含以下 3 行的文件;

'line1\nline2reallybig\nline3\nlast line'

To delete the second line you'd have to move the third and fourth lines' positions in the disk. The only way would be to store the third and fourth lines somewhere, truncate the file on the second line, and rewrite the missing lines.

要删除第二行,您必须移动磁盘中第三行和第四行的位置。唯一的方法是将第三行和第四行存储在某处,截断第二行的文件,并重写丢失的行。

If you know the size of every line in the text file, you can truncate the file in any position using .truncate(line_size * line_number)but even then you'd have to rewrite everything after the line.

如果您知道文本文件中每一行的大小,则可以使用 截断文件的任何位置,.truncate(line_size * line_number)但即使如此,您也必须重写该行之后的所有内容。

回答by sykora

You're better off keeping a index into the file so that you can start where you stopped last, without destroying part of the file. Something like this would work :

你最好在文件中保留一个索引,这样你就可以从上次停止的地方开始,而不会破坏文件的一部分。像这样的事情会起作用:

try :
    for index, line in enumerate(file) :
        processLine(line)
except :
    # Failed, start from this line number next time.
    print(index)
    raise

回答by Imran

Truncating the file as you read it seems a bit extreme. What if your script has a bug that doesn't cause an error? In that case you'll want to restart at the beginning of your file.

在阅读时截断文件似乎有点极端。如果您的脚本有一个不会导致错误的错误怎么办?在这种情况下,您需要在文件的开头重新启动。

How about having your script print the line number it breaks on and having it take a line number as a parameter so you can tell it which line to start processing from?

让你的脚本打印它中断的行号并让它接受一个行号作为参数如何,这样你就可以告诉它从哪一行开始处理?

回答by zoul

First of all, calling the operation truncateis probably not the best pick. If I understand the problem correctly, you want to delete everything up to the current position in file. (I would expect truncateto cut everything from the current position up to the end of the file. This is how the standard Python truncatemethod works, at least if I Googled correctly.)

首先,调用操作truncate可能不是最好的选择。如果我正确理解问题,您想删除文件中当前位置的所有内容。(我希望truncate将所有内容从当前位置剪切到文件末尾。这就是标准 Pythontruncate方法的工作方式,至少在我正确搜索的情况下是这样。)

Second, I am not sure it is wise to modify the file while iterating on in using the forloop. Wouldn't it be better to save the number of lines processed and delete them after the main loop has finished, exception or not? The file iterator supports in-place filtering, which means it should be fairly simple to drop the processed lines afterwards.

其次,我不确定在使用for循环进行迭代时修改文件是否明智。保存处理的行数并在主循环完成后删除它们不是更好吗,异常与否?文件迭代器支持就地过滤,这意味着之后删除处理过的行应该相当简单。

P.S. I don't know Python, take this with a grain of salt.

PS 我不会 Python,请稍加保留。

回答by gvrocha

A related post has what seems a good strategy to do that, see How can I run the first process from a list of processes stored in a file and immediately delete the first line as if the file was a queue and I called "pop"?

一个相关的帖子似乎是一个很好的策略,请参阅 如何从存储在文件中的进程列表中运行第一个进程并立即删除第一行,就好像该文件是一个队列并且我称之为“pop”?

I have used it as follows:

我使用它如下:

  import os;

  tasklist_file = open(tasklist_filename, 'rw');  
  first_line = tasklist_file.readline();
  temp = os.system("sed -i -e '1d' " + tasklist_filename); # remove first line from task file;

I'm not sure it works on Windows. Tried it on a mac and it did do the trick.

我不确定它是否适用于 Windows。在 mac 上试了一下,确实成功了。

回答by Matt Kraemer

This is what I use for file based queues. It returns the first line and rewrites the file with the rest. When it's done it returns None:

这是我用于基于文件的队列。它返回第一行并用其余部分重写文件。完成后,它返回 None:

def pop_a_text_line(filename):
    with open(filename,'r') as f:
        S = f.readlines()
    if len(S) > 0:
        pop = S[0]
        with open(filename,'w') as f:
            f.writelines(S[1:])
    else:
        pop = None
    return pop