使用 Python 从目录中读取所有 csv 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33503993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:27:58  来源:igfitidea点击:

Read in all csv files from a directory using Python

pythoncsvfor-loopnumpygenfromtxt

提问by FaCoffee

I hope this is not trivial but I am wondering the following:

我希望这不是微不足道的,但我想知道以下几点:

If I have a specific folder with ncsvfiles, how could I iteratively read all of them, one at a time, and perform some calculations on their values?

如果我有一个包含n 个csv文件的特定文件夹,我如何一次一个地迭代读取所有这些文件,并对它们的值执行一些计算?

For a single file, for example, I do something like this and perform some calculations on the xarray:

例如,对于单个文件,我执行以下操作并对x数组执行一些计算:

import csv
import os

directoryPath=raw_input('Directory path for native csv file: ') 
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2] #Creates the array that will undergo a set of calculations

I know that I can check how many csvfiles there are in a given folder (check here):

我知道我可以检查csv给定文件夹中有多少文件(检查这里):

import glob
for files in glob.glob("*.csv"):
    print files 

But I failed to figure out how to possibly nest the numpy.genfromtxt()function in a for loop, so that I read in all the csv files of a directory that it is up to me to specify.

但是我没有弄清楚如何将numpy.genfromtxt()函数嵌套在 for 循环中,因此我读取了由我指定的目录的所有 csv 文件。

EDIT

编辑

The folder I have only has jpgand csvfiles. The latter are named eventX.csv, where Xranges from 1 to 50. The forloop I am referring to should therefore consider the file names the way they are.

我只有的文件夹jpgcsv文件。后者被命名为eventX.csv,其中X 的范围从 1 到 50。因此,for我所指的循环应该按照文件名的方式考虑文件名。

采纳答案by FaCoffee

That's how I'd do it:

这就是我要做的:

import os

directory = os.path.join("c:\","path")
for root,dirs,files in os.walk(directory):
    for file in files:
       if file.endswith(".csv"):
           f=open(file, 'r')
           #  perform calculation
           f.close()

回答by plonser

I think you look for something like this

我想你在寻找这样的东西

import glob

for file_name in glob.glob(directoryPath+'*.csv'):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations

Edit

编辑

If you want to get all csvfiles from a folder (including subfolder) you could use subprocessinstead of glob(note that this code only works on linux systems)

如果你想csv从一个文件夹(包括子文件夹)中获取所有文件,你可以使用subprocess而不是glob(请注意,此代码仅适用于 linux 系统)

import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]

for i,file_name in enumerate(file_list):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations
    # now you can use i as an index

It first searches the folder and sub-folders for all file_names using the findcommand from the shell and applies your calculations afterwards.

它首先使用findshell 中的命令在文件夹和子文件夹中搜索所有文件名,然后应用您的计算。

回答by Ward

According to the documentationof numpy.genfromtxt(), the first argument can be a

根据该文件numpy.genfromtxt(),第一个参数可以是

File, filename, or generator to read.

要读取的文件、文件名或生成器。

That would mean that you could write a generator that yields the lines of all the files like this:

这意味着您可以编写一个生成器来生成所有文件的行,如下所示:

def csv_merge_generator(pattern):
    for file in glob.glob(pattern):
        for line in file:
            yield line

# then using it like this

numpy.genfromtxt(csv_merge_generator('*.csv')) 

should work. (I do not have numpy installed, so cannot test easily)

应该管用。(我没有安装 numpy,所以无法轻松测试)

回答by Shahidur

Using pandas and glob as the base packages

使用 pandas 和 glob 作为基础包

import glob
import pandas as pd

glued_data = pd.DataFrame()
for file_name in glob.glob(directoryPath+'*.csv'):
    x = pd.read_csv(file_name, low_memory=False)
    glued_data = pd.concat([glued_data,x],axis=0)