如何在python中删除行CSV
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16271331/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to Delete Rows CSV in python
提问by justin
I'm trying to compare two csv files (fileA and fileB), and remove any rows from fileA that are not found in fileB. I want to be able to do this without creating a third file. I thought I could do this using the csv writer module but now I'm second guessing myself.
我正在尝试比较两个 csv 文件(fileA 和 fileB),并从 fileA 中删除在 fileB 中找不到的任何行。我希望能够在不创建第三个文件的情况下执行此操作。我以为我可以使用 csv writer 模块来做到这一点,但现在我又怀疑自己了。
Currently, I'm using the following code to record my comparison data from file B:
目前,我正在使用以下代码记录文件 B 中的比较数据:
removal_list = set()
with open('fileB', 'rb') as file_b:
reader1 = csv.reader(file_b)
next(reader1)
for row in reader1:
removal_list.add((row[0], row[2]))
This is where I'm stuck and do not know how to delete the rows:
这是我卡住的地方,不知道如何删除行:
with open('fileA', 'ab') as file_a:
with open('fileB', 'rb') as file_b:
writer = csv.writer(file_a)
reader2 = csv.reader(file_b)
next(reader2)
for row in reader2:
if (row[0], row[2]) not in removal_list:
# If row was not present in file B, Delete it from file A.
#stuck here: writer.<HowDtheitroademoveRow>(row)
采纳答案by jamylak
This solution uses fileinputwith inplace=True, which writes to a temporary file and then automatically renames it at the end to your file name. You can't removerows from a file but you can rewrite it with only the ones you want.
此解决方案使用fileinputwith inplace=True,它会写入一个临时文件,然后在末尾自动将其重命名为您的文件名。您不能从文件中删除行,但可以只用您想要的行重写它。
if the keyword argument
inplace=1is passed tofileinput.input()or to theFileInputconstructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place.
如果关键字参数
inplace=1被传递给fileinput.input()或FileInput构造函数,则文件被移动到备份文件,标准输出被定向到输入文件(如果与备份文件同名的文件已经存在,它将被静默替换) . 这使得编写一个过滤器来重写其输入文件成为可能。
fileA
文件A
h1,h2,h3
a,b,c
d,e,f
g,h,i
j,k,l
fileB
文件B
h1,h2,h3
a,b,c
1,2,3
g,h,i
4,5,6
import fileinput, sys, csv
with open('fileB', 'rb') as file_b:
r = csv.reader(file_b)
next(r) #skip header
seen = {(row[0], row[2]) for row in r}
f = fileinput.input('fileA', inplace=True) # sys.stdout is redirected to the file
print next(f), # write header as first line
w = csv.writer(sys.stdout)
for row in csv.reader(f):
if (row[0], row[2]) in seen: # write it if it's in B
w.writerow(row)
fileA
文件A
h1,h2,h3
a,b,c
g,h,i
回答by Lennart Regebro
CSV is not a database format. It is read and written as a whole. You can't remove rows in the middle. So the only way to do this without creating a third file is to read in the file completely in memory and then write it out, without the offending rows.
CSV 不是数据库格式。它是作为一个整体读取和写入的。您不能删除中间的行。因此,在不创建第三个文件的情况下执行此操作的唯一方法是在内存中完全读入该文件,然后将其写出,而不包含有问题的行。
But in general it's better to use a third file.
但一般来说最好使用第三个文件。
回答by David Cain
As Lennart described, you can't modify a CSV file in-place as you iterate over it.
正如 Lennart 所描述的,您无法在迭代 CSV 文件时就地修改它。
If you're really opposed to creating a third file, you might want to look into using a string buffer with StringIO, the idea being that you build up the new desired contents of file A in memory. At the end of your script, you can write the contents of the buffer over file A.
如果您真的反对创建第三个文件,您可能需要考虑将字符串缓冲区与StringIO一起使用,其想法是在内存中构建文件 A 的新所需内容。在脚本的末尾,您可以将缓冲区的内容写入文件 A。
from cStringIO import StringIO
with open('fileB', 'rb') as file_b:
new_a_buf = StringIO()
writer = csv.writer(new_a_buf)
reader2 = csv.reader(file_b)
next(reader2)
for row in reader2:
if (row[0], row[2]) not in removal_list:
writer.writerow(row)
# At this point, the contents (new_a_buf) exist in memory
with open('fileA', 'wb') as file_a:
file_a.write(new_a_buf.getvalue())

