使用 Python 内联 CSV 文件编辑

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16020858/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:37:07  来源:igfitidea点击:

Inline CSV File Editing with Python

pythoncsvfile-io

提问by Evil Closet Monkey

Can I modify a CSV file inline using Python's CSV library, or similar technique?

我可以使用 Python 的 CSV 库或类似技术内联修改 CSV 文件吗?

Current I am processing a file and updating the first column (a name field) to change the formatting. A simplified version of my code looks like this:

当前我正在处理一个文件并更新第一列(名称字段)以更改格式。我的代码的简化版本如下所示:

with open('tmpEmployeeDatabase-out.csv', 'w') as csvOutput:
    writer = csv.writer(csvOutput, delimiter=',', quotechar='"')

    with open('tmpEmployeeDatabase.csv', 'r') as csvFile:
        reader = csv.reader(csvFile, delimiter=',', quotechar='"')

        for row in reader:
            row[0] = row[0].title()
            writer.writerow(row)

The philosophy works, but I am curious if I can do an inline edit so that I'm not duplicating the file.

这个理念有效,但我很好奇我是否可以进行内联编辑,这样我就不会复制文件。

I've tried the follow, but this appends the new records to the end of the file instead of replacing them.

我尝试了以下操作,但这会将新记录附加到文件末尾而不是替换它们。

with open('tmpEmployeeDatabase.csv', 'r+') as csvFile:
    reader = csv.reader(csvFile, delimiter=',', quotechar='"')
    writer = csv.writer(csvFile, delimiter=',', quotechar='"')

    for row in reader:
        row[1] = row[1].title()
        writer.writerow(row)

采纳答案by Martijn Pieters

No, you should not attempt to write to the file you are currently reading from. You cando it if you keep seeking back after reading a row but it is not advisable, especially if you are writing back more data than you read.

不,您不应该尝试写入您当前正在读取的文件。你可以做,如果你继续seek读一排后掀背,但它是不可取的,尤其是如果你正在写回的数据比你读。

The canonical method is to write to a new, temporaryfile and move that into place over the old file you read from.

规范的方法是写入一个新的临时文件并将其移到您读取的旧文件上。

from tempfile import NamedTemporaryFile
import shutil
import csv

filename = 'tmpEmployeeDatabase.csv'
tempfile = NamedTemporaryFile(delete=False)

with open(filename, 'rb') as csvFile, tempfile:
    reader = csv.reader(csvFile, delimiter=',', quotechar='"')
    writer = csv.writer(tempfile, delimiter=',', quotechar='"')

    for row in reader:
        row[1] = row[1].title()
        writer.writerow(row)

shutil.move(tempfile.name, filename)

I've made use of the tempfileand shutillibraries here to make the task easier.

我在这里使用了tempfileshutil库来简化任务。

回答by tylerl

There is no underlying system call for insertingdata into a file. You can overwrite, you can append, and you can replace. But inserting data into the middle means reading and rewriting the entirefile from the point you made your edit down to the end.

没有数据插入文件的底层系统调用。您可以覆盖,可以追加,也可以替换。但是将数据插入中间意味着从您进行编辑的那一点到最后读取和重写整个文件。

As such, the two ways to do this are either (a) slurp the entire file into memory, make your edits there, and then dump the result back to disk, or (b) open up a temporary output file where you write your results while you read the input file, and then replace the old file with the new one once you get to the end. One method uses more ram, the other uses more disk space.

因此,执行此操作的两种方法是 (a) 将整个文件放入内存中,在那里进行编辑,然后将结果转储回磁盘,或者 (b) 打开一个临时输出文件,在其中写入您的结果当您读取输入文件时,然后在读到最后时用新文件替换旧文件。一种方法使用更多内存,另一种方法使用更多磁盘空间。