Python 循环遍历 CSV 文件及其列

Question

提问by humbleCoder

so I've seen this done is other questions asked here but I'm still a little confused. I've been learning python3 for the last few days and figured I'd start working on a project to really get my hands dirty. I need to loop through a certain amount of CSV files and make edits to those files. I'm having trouble with going to a specific column and also for loops in python in general. I'm used to the convention (int i = 0; i < expression; i++), but in python it's a little different. Here's my code so far and I'll explain where my issue is.

所以我看到这里已经完成了其他问题，但我仍然有点困惑。过去几天我一直在学习 python3 并认为我会开始从事一个项目来真正弄脏我的手。我需要遍历一定数量的 CSV 文件并对这些文件进行编辑。我在转到特定列以及通常在 python 中进行循环时遇到问题。我习惯了约定（int i = 0; i < expression; i++），但在python中有点不同。到目前为止，这是我的代码，我将解释我的问题所在。

import os
import csv

pathName = os.getcwd()

numFiles = []
fileNames = os.listdir(pathName)
for fileNames in fileNames:
    if fileNames.endswith(".csv"):
        numFiles.append(fileNames)

for i in numFiles:
    file = open(os.path.join(pathName, i), "rU")
    reader = csv.reader(file, delimiter=',')
    for column in reader:
        print(column[4])

My issue falls on this line:

我的问题落在这一行：

for column in reader:
        print(column[4])

So in the Docs it says column is the variable and reader is what I'm looping through. But when I write 4 I get this error:

所以在文档中它说 column 是变量，而 reader 是我循环遍历的内容。但是当我写 4 我得到这个错误：

IndexError: list index out of range

What does this mean? If I write 0 instead of 4 it prints out all of the values in column 0 cell 0 of each CSV file. I basically need it to go through the first row of each CSV file and find a specific value and then go through that entire column. Thanks in advance!

这是什么意思？如果我写 0 而不是 4，它会打印出每个 CSV 文件的第 0 列单元格 0 中的所有值。我基本上需要它遍历每个 CSV 文件的第一行并找到一个特定的值，然后遍历整个列。提前致谢！

Answer 1

回答by Bigbob556677

It could be that you don't have 5 columns in your .csv file.

可能是您的 .csv 文件中没有 5 列。

Python is base0 which means it starts counting at 0 so the first column would be column[0], the second would be column[1].

Python 是 base0，这意味着它从 0 开始计数，因此第一列将是 column[0]，第二列将是 column[1]。

Also you may want to change your

你也可能想改变你的

for column in reader:

to

到

for row in reader:

because reader iterates through the rows, not the columns to my understanding.

因为读者遍历行，而不是我理解的列。

This code loops through each row and then each column in that row allowing you to view the contents of each cell.

此代码循环遍历每一行，然后循环该行中的每一列，允许您查看每个单元格的内容。

for i in numFiles:
    file = open(os.path.join(pathName, i), "rU")
    reader = csv.reader(file, delimiter=',')
    for row in reader:
        for column in row:
            print(column)
            if column=="SPECIFIC VALUE":
                #do stuff

Answer 2

回答by Doron Cohen

Welcome to Python! I suggest you to print some debugging messages.

欢迎使用 Python！我建议你打印一些调试信息。

You could add this to you printing loop:

您可以将其添加到打印循环中：

for row in reader:
    try:
        print(row[4])
    except IndexError as ex:
        print("ERROR: %s in file %s doesn't contain 5 colums" % (row, i))

This will print bad lines (as lists because this is how they are represented in CSVReader) so you could fix the CSV files.

这将打印坏行（作为列表，因为这是它们在中的表示方式CSVReader），因此您可以修复 CSV 文件。

Some notes:

一些注意事项：

It is common to use snake_casein Python and not camelCase
Name your variables appropriately (csv_filenameinstead of i, rowinstead of columnetc.)
Use the withclose to handle files (read more)

snake_case在 Python 中使用很常见，而不是camelCase
适当地命名您的变量（csv_filename而不是i，row而不是column等）
使用withclose 来处理文件（阅读更多）

Enjoy!

享受！

Python 循环遍历 CSV 文件及其列

提问by humbleCoder

回答by Bigbob556677

回答by Doron Cohen

相关推荐

最近更新

标签

Python 循环遍历 CSV 文件及其列

提问by humbleCoder

回答by Bigbob556677

回答by Doron Cohen

相关推荐

Python 从 pyspark 中的数据帧构建 StructType

返回字典中的第一个键 - Python 3

Python Selenium 超时异常捕获

Python 如何重新分区pyspark数据帧？

相关推荐

最近更新

标签