使用 Python 删除文件中的特定行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4710067/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 17:08:08  来源:igfitidea点击:

using Python for deleting a specific line in a file

pythonfileinput

提问by SourD

Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?

假设我有一个充满昵称的文本文件。如何使用 Python 从此文件中删除特定昵称?

采纳答案by houbysoft

First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:

首先,打开文件并从文件中获取所有行。然后以写入模式重新打开文件并将您的行写回,除了要删除的行:

with open("yourfile.txt", "r") as f:
    lines = f.readlines()
with open("yourfile.txt", "w") as f:
    for line in lines:
        if line.strip("\n") != "nickname_to_delete":
            f.write(line)

You need to strip("\n")the newline character in the comparison because if your file doesn't end with a newline character the very last linewon't either.

您需要strip("\n")在比较中使用换行符,因为如果您的文件不以换行符结尾,则最后一个line也不会。

回答by Nikhil

Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.

获取文件的内容,通过换行符将其拆分为一个元组。然后,访问您的元组的行号,加入您的结果元组,并覆盖到文件中。

回答by Hugh Bothwell

In general, you can't; you have to write the whole file again (at least from the point of change to the end).

一般来说,你不能;您必须再次写入整个文件(至少从更改点到结束)。

In some specific cases you can do better than this -

在某些特定情况下,您可以做得比这更好-

if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;

如果所有数据元素的长度相同且没有特定顺序,并且您知道要删除的数据元素的偏移量,则可以将最后一项复制到要删除的一项上,并在最后一项之前截断文件;

or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.

或者您可以只用“这是坏数据,跳过它”值覆盖数据块或在保存的数据元素中保留“此项目已删除”标志,以便您可以将其标记为已删除,而无需修改文件。

This is probably overkill for short documents (anything under 100 KB?).

对于短文档(小于 100 KB 的任何内容?),这可能是矫枉过正。

回答by Kingz

The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.

在第一遍中读取行并在第二遍中进行更改(删除特定行)的问题在于,如果文件大小很大,则会耗尽 RAM。相反,更好的方法是逐行读取行,然后将它们写入单独的文件中,删除不需要的行。我已经对 12-50 GB 的文件运行了这种方法,并且 RAM 使用率几乎保持不变。只有 CPU 周期显示正在进行的处理。

回答by Barnabe

The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.

在我看来,最好和最快的选择是在其他地方重新写入文件,而不是将所有内容存储在列表中并重新打开文件进行写入。

with open("yourfile.txt", "r") as input:
    with open("newfile.txt", "w") as output: 
        for line in input:
            if line.strip("\n") != "nickname_to_delete":
                output.write(line)

That's it! In one loop and one only you can do the same thing. It will be much faster.

就是这样!在一个循环中,只有一个循环,你可以做同样的事情。它会快得多。

回答by Deep

I liked the fileinput approach as explained in this answer: Deleting a line from a text file (python)

我喜欢这个答案中解释的文件输入方法: Deleting a line from a text file (python)

Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:

比如说我有一个文件,里面有空行,我想删除空行,这是我解决的方法:

import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
    if len(line) > 1:
            sys.stdout.write(line)

Note: The empty lines in my case had length 1

注意:我的情况下的空行长度为 1

回答by Lother

Solution to this problem with only a single open:

只需打开一个即可解决此问题:

with open("target.txt", "r+") as f:
    d = f.readlines()
    f.seek(0)
    for i in d:
        if i != "line you want to remove...":
            f.write(i)
    f.truncate()

This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.

此解决方案以 r/w 模式(“r+”)打开文件,并利用 seek 重置 f 指针,然后截断以在最后一次写入后删除所有内容。

回答by andrii1986

Probably, you already got a correct answer, but here is mine. Instead of using a list to collect unfiltered data (what readlines()method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:

可能你已经得到了正确的答案,但这是我的。readlines()我没有使用列表来收集未过滤的数据(使用什么方法),而是使用两个文件。一个用于保存主数据,第二个用于在删除特定字符串时过滤数据。这是一个代码:

main_file = open('data_base.txt').read()    # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
    if 'your data to delete' not in line:    # remove a specific string
        main_file.write(line)                # put all strings back to your db except deleted
    else: pass
main_file.close()

Hope you will find this useful! :)

希望你会发现这很有用!:)

回答by Ren

If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:

如果您使用 Linux,您可以尝试以下方法。
假设您有一个名为 的文本文件animal.txt

$ cat animal.txt  
dog
pig
cat 
monkey         
elephant  

Delete the first line:

删除第一行:

>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt']) 

then

然后

$ cat animal.txt
pig
cat
monkey
elephant

回答by A Malik

I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.

我认为如果您将文件读入列表,然后执行该操作,您可以遍历列表以查找要删除的昵称。您可以在不创建其他文件的情况下高效地完成此操作,但您必须将结果写回源文件。

Here's how I might do this:

这是我可能会这样做的方法:

import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']

I'm assuming nicknames.csvcontains data like:

我假设nicknames.csv包含如下数据:

Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...

Then load the file into the list:

然后将文件加载到列表中:

 nicknames = None
 with open("nicknames.csv") as sourceFile:
     nicknames = sourceFile.read().splitlines()

Next, iterate over to list to match your inputs to delete:

接下来,迭代到列表以匹配您要删除的输入:

for nick in nicknames_to_delete:
     try:
         if nick in nicknames:
             nicknames.pop(nicknames.index(nick))
         else:
             print(nick + " is not found in the file")
     except ValueError:
         pass

Lastly, write the result back to file:

最后,将结果写回文件:

with open("nicknames.csv", "a") as nicknamesFile:
    nicknamesFile.seek(0)
    nicknamesFile.truncate()
    nicknamesWriter = csv.writer(nicknamesFile)
    for name in nicknames:
        nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()