Python 循环遍历 CSV 文件及其列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45947887/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Looping through CSV files and their columns
提问by humbleCoder
so I've seen this done is other questions asked here but I'm still a little confused. I've been learning python3 for the last few days and figured I'd start working on a project to really get my hands dirty. I need to loop through a certain amount of CSV files and make edits to those files. I'm having trouble with going to a specific column and also for loops in python in general. I'm used to the convention (int i = 0; i < expression; i++), but in python it's a little different. Here's my code so far and I'll explain where my issue is.
所以我看到这里已经完成了其他问题,但我仍然有点困惑。过去几天我一直在学习 python3 并认为我会开始从事一个项目来真正弄脏我的手。我需要遍历一定数量的 CSV 文件并对这些文件进行编辑。我在转到特定列以及通常在 python 中进行循环时遇到问题。我习惯了约定(int i = 0; i < expression; i++),但在python中有点不同。到目前为止,这是我的代码,我将解释我的问题所在。
import os
import csv
pathName = os.getcwd()
numFiles = []
fileNames = os.listdir(pathName)
for fileNames in fileNames:
if fileNames.endswith(".csv"):
numFiles.append(fileNames)
for i in numFiles:
file = open(os.path.join(pathName, i), "rU")
reader = csv.reader(file, delimiter=',')
for column in reader:
print(column[4])
My issue falls on this line:
我的问题落在这一行:
for column in reader:
print(column[4])
So in the Docs it says column is the variable and reader is what I'm looping through. But when I write 4 I get this error:
所以在文档中它说 column 是变量,而 reader 是我循环遍历的内容。但是当我写 4 我得到这个错误:
IndexError: list index out of range
What does this mean? If I write 0 instead of 4 it prints out all of the values in column 0 cell 0 of each CSV file. I basically need it to go through the first row of each CSV file and find a specific value and then go through that entire column. Thanks in advance!
这是什么意思?如果我写 0 而不是 4,它会打印出每个 CSV 文件的第 0 列单元格 0 中的所有值。我基本上需要它遍历每个 CSV 文件的第一行并找到一个特定的值,然后遍历整个列。提前致谢!
回答by Bigbob556677
It could be that you don't have 5 columns in your .csv file.
可能是您的 .csv 文件中没有 5 列。
Python is base0 which means it starts counting at 0 so the first column would be column[0], the second would be column[1].
Python 是 base0,这意味着它从 0 开始计数,因此第一列将是 column[0],第二列将是 column[1]。
Also you may want to change your
你也可能想改变你的
for column in reader:
to
到
for row in reader:
because reader iterates through the rows, not the columns to my understanding.
因为读者遍历行,而不是我理解的列。
This code loops through each row and then each column in that row allowing you to view the contents of each cell.
此代码循环遍历每一行,然后循环该行中的每一列,允许您查看每个单元格的内容。
for i in numFiles:
file = open(os.path.join(pathName, i), "rU")
reader = csv.reader(file, delimiter=',')
for row in reader:
for column in row:
print(column)
if column=="SPECIFIC VALUE":
#do stuff
回答by Doron Cohen
Welcome to Python! I suggest you to print some debugging messages.
欢迎使用 Python!我建议你打印一些调试信息。
You could add this to you printing loop:
您可以将其添加到打印循环中:
for row in reader:
try:
print(row[4])
except IndexError as ex:
print("ERROR: %s in file %s doesn't contain 5 colums" % (row, i))
This will print bad lines (as lists because this is how they are represented in CSVReader
) so you could fix the CSV files.
这将打印坏行(作为列表,因为这是它们在 中的表示方式CSVReader
),因此您可以修复 CSV 文件。
Some notes:
一些注意事项:
- It is common to use
snake_case
in Python and notcamelCase
- Name your variables appropriately (
csv_filename
instead ofi
,row
instead ofcolumn
etc.) - Use the
with
close to handle files (read more)
snake_case
在 Python 中使用很常见,而不是camelCase
- 适当地命名您的变量(
csv_filename
而不是i
,row
而不是column
等) - 使用
with
close 来处理文件(阅读更多)
Enjoy!
享受!