使用 Python 从目录中读取所有 csv 文件

Question

提问by FaCoffee

I hope this is not trivial but I am wondering the following:

我希望这不是微不足道的，但我想知道以下几点：

If I have a specific folder with ncsvfiles, how could I iteratively read all of them, one at a time, and perform some calculations on their values?

如果我有一个包含n 个csv文件的特定文件夹，我如何一次一个地迭代读取所有这些文件，并对它们的值执行一些计算？

For a single file, for example, I do something like this and perform some calculations on the xarray:

例如，对于单个文件，我执行以下操作并对x数组执行一些计算：

import csv
import os

directoryPath=raw_input('Directory path for native csv file: ') 
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2] #Creates the array that will undergo a set of calculations

I know that I can check how many csvfiles there are in a given folder (check here):

我知道我可以检查csv给定文件夹中有多少文件（检查这里）：

import glob
for files in glob.glob("*.csv"):
    print files

But I failed to figure out how to possibly nest the numpy.genfromtxt()function in a for loop, so that I read in all the csv files of a directory that it is up to me to specify.

但是我没有弄清楚如何将numpy.genfromtxt()函数嵌套在 for 循环中，因此我读取了由我指定的目录的所有 csv 文件。

EDIT

编辑

The folder I have only has jpgand csvfiles. The latter are named eventX.csv, where Xranges from 1 to 50. The forloop I am referring to should therefore consider the file names the way they are.

我只有的文件夹jpg和csv文件。后者被命名为eventX.csv，其中X 的范围从 1 到 50。因此，for我所指的循环应该按照文件名的方式考虑文件名。

Answer 1

采纳答案by FaCoffee

That's how I'd do it:

这就是我要做的：

import os

directory = os.path.join("c:\","path")
for root,dirs,files in os.walk(directory):
    for file in files:
       if file.endswith(".csv"):
           f=open(file, 'r')
           #  perform calculation
           f.close()

Answer 2

回答by plonser

I think you look for something like this

我想你在寻找这样的东西

import glob

for file_name in glob.glob(directoryPath+'*.csv'):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations

Edit

编辑

If you want to get all csvfiles from a folder (including subfolder) you could use subprocessinstead of glob(note that this code only works on linux systems)

如果你想csv从一个文件夹（包括子文件夹）中获取所有文件，你可以使用subprocess而不是glob（请注意，此代码仅适用于 linux 系统）

import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]

for i,file_name in enumerate(file_list):
    x = np.genfromtxt(file_name,delimiter=',')[:,2]
    # do your calculations
    # now you can use i as an index

It first searches the folder and sub-folders for all file_names using the findcommand from the shell and applies your calculations afterwards.

它首先使用findshell 中的命令在文件夹和子文件夹中搜索所有文件名，然后应用您的计算。

Answer 3

回答by Ward

According to the documentationof numpy.genfromtxt(), the first argument can be a

根据该文件的numpy.genfromtxt()，第一个参数可以是

File, filename, or generator to read.

要读取的文件、文件名或生成器。

That would mean that you could write a generator that yields the lines of all the files like this:

这意味着您可以编写一个生成器来生成所有文件的行，如下所示：

def csv_merge_generator(pattern):
    for file in glob.glob(pattern):
        for line in file:
            yield line

# then using it like this

numpy.genfromtxt(csv_merge_generator('*.csv'))

should work. (I do not have numpy installed, so cannot test easily)

应该管用。（我没有安装 numpy，所以无法轻松测试）

Answer 4

回答by Shahidur

Using pandas and glob as the base packages

使用 pandas 和 glob 作为基础包

import glob
import pandas as pd

glued_data = pd.DataFrame()
for file_name in glob.glob(directoryPath+'*.csv'):
    x = pd.read_csv(file_name, low_memory=False)
    glued_data = pd.concat([glued_data,x],axis=0)

使用 Python 从目录中读取所有 csv 文件

提问by FaCoffee

采纳答案by FaCoffee

回答by plonser

回答by Ward

回答by Shahidur

相关推荐

最近更新

标签

使用 Python 从目录中读取所有 csv 文件

提问by FaCoffee

采纳答案by FaCoffee

回答by plonser

回答by Ward

回答by Shahidur

相关推荐

在python中打印json文件的所有键

在python中将罗马数字转换为整数

预计python中有两个空行pep8警告

在新的终端窗口中从 python 执行终端命令？

相关推荐

最近更新

标签