将不同目录中的多个 .csv 文件读入 Pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39838332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:07:56  来源:igfitidea点击:

Reading multiple .csv files from different directories into pandas DataFrame

pythoncsvpandasdataframeoperating-system

提问by MScar

My DataFrame has a index SubjectID, and each Subject ID has its own directory. In each Subject directory is a .csv file with info that I want to put into my DataFrame. Using my SubjectID index, I want to read in the header of the .csv file for every subject and put it into a new column in my DataFrame.

我的 DataFrame 有一个索引 SubjectID,每个主题 ID 都有自己的目录。在每个主题目录中都有一个 .csv 文件,其中包含我想放入我的 DataFrame 中的信息。使用我的 SubjectID 索引,我想读取每个主题的 .csv 文件的标题,并将其放入我的 DataFrame 中的新列中。

Each subject directory has the same pathway except for the individual subject number.

除了个别科目编号外,每个科目目录都具有相同的路径。

I have found ways to read multiple .csv files from a single target directory into a pandas DataFrame, but not from multiple directories. Here is some code I have for importing multiple .csv files from a target directory:

我找到了将多个 .csv 文件从单个目标目录读取到 Pandas DataFrame 中的方法,但不能从多个目录中读取。这是我用于从目标目录导入多个 .csv 文件的一些代码:

subject_path = ('/home/mydirectory/SubjectID/')
filelist = []
os.chdir('subject_path')
for files in glob.glob( "*.csv" ) :
    filelist.append(files)

# read each csv file into single dataframe and add a filename reference column 
df = pd.DataFrame()
columns = range(1,100)
for c, f in enumerate(filelist) :
    key = "file%i" % c
    frame = pd.read_csv( (subject_path + f), skiprows = 1, index_col=0, names=columns )
    frame['key'] = key
    df = df.append(frame,ignore_index=True)

I want to do something similar but iteratively go into the different Subject directories instead of having a single target directory.

我想做一些类似的事情,但反复进入不同的主题目录,而不是只有一个目标目录。

Edit: I think I want to do this using osnot pandas, is there a way to use a loop to search through multiple directories using os?

编辑:我想我想使用osnot来做到这一点pandas,有没有办法使用循环来搜索多个目录os

回答by Parfait

Consider the recursive method of os.walk()to read all directories and files top-down(default=TRUE) or bottom-up. Additionally, you can use regex to check names to filter specifically for .csv files.

考虑os.walk()的递归方法来读取所有目录和文件自上而下(默认 = TRUE)或自下而上。此外,您可以使用正则表达式检查名称以专门针对 .csv 文件进行过滤。

Below will import ALL csv files in any child/grandchild folder from the target root /home/mydirectory. So, be sure to check if non-subject csv files exist, else adjust re.match()accordingly:

下面将从目标根目录/home/mydirectory导入任何子/孙文件夹中的所有 csv 文件。因此,请务必检查是否存在非主题 csv 文件,否则进行相应调整re.match()

import os, re
import pandas as pd

# CURRENT DIRECTORY (PLACE SCRIPT IN /home/mydirectory)
cd = os.path.dirname(os.path.abspath(__file__))

i = 0
columns = range(1,100)
dfList = []

for root, dirs, files in os.walk(cd):
    for fname in files:
        if re.match("^.*.csv$", fname):
            frame = pd.read_csv(os.path.join(root, fname), skiprows = 1, 
                                index_col=0, names=columns)
            frame['key'] = "file{}".format(i)
            dfList.append(frame)    
            i += 1

df = pd.concat(dfList)

回答by Scratch'N'Purr

Assuming your subject folders are in mydirectory, you can just create a list of all folders in the directory and then add the csv's into your filelist.

假设您的主题文件夹在 中mydirectory,您只需创建目录中所有文件夹的列表,然后将 csv 添加到您的文件列表中。

import os

parent_dir = '/home/mydirectory'
subject_dirs = [os.path.join(parent_dir, dir) for dir in os.listdir(parent_dir) if os.path.isdir(os.path.join(parent_dir, dir))]

filelist = []
for dir in subject_dirs:
    csv_files = [os.path.join(dir, csv) for csv in os.listdir(dir) if os.path.isfile(os.path.join(dir, csv)) and csv.endswith('.csv')]
    for file in csv_files:
        filelist.append(file)

# Do what you did with the dataframe from here
...