pandas 读取多个 csv 文件,将文件名列表连接到单个 DataFrame 中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35973782/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading multiple csv files, concatenate list of file names them into a singe DataFrame
提问by mtkilic
I have multiple csv files in directory and I would loop thru to csv files find a list of files names and read each one in and concatenate them into a single data frame. In the case of a single, just read the dataset in.
我在目录中有多个 csv 文件,我会循环到 csv 文件,找到文件名列表并读取每个文件并将它们连接成一个数据框。在单个的情况下,只需读入数据集即可。
Here example of csv files I have in my Directory:
这是我的目录中的 csv 文件示例:
- 2013_nba.csv
- 2014_nba.csv
- 2015_nba.csv
- 2013_basketball.csv
- 2014_basketball.csv
- 2015_soccer.csv
- 2013_nba.csv
- 2014_nba.csv
- 2015_nba.csv
- 2013_basketball.csv
- 2014_basketball.csv
- 2015_soccer.csv
This is what I have so far. But this basically reads all csv files and concatenate them into a single DF. I need help one how to loop thru to find find list of strings(csv)
这就是我迄今为止所拥有的。但这基本上会读取所有 csv 文件并将它们连接成一个 DF。我需要帮助一个如何循环查找字符串列表(csv)
path = 'C:\Users\csvfiles\'
csvFiles = glob.glob(path + "/*.csv")
list_ = []
for files in csvFiles:
df = pd.read_csv(files, index_col=None, header=0)
list_.append(df)
frame = pd.concat(list_, ignore_index=True)
I am newby in python, I try to do "for "nba" in files" to pull all csv files names have "nba" in and then make one DF, but wasn't successful.
我是 Python 新手,我尝试执行“文件中的“nba”操作以将所有 csv 文件名中包含“nba”,然后创建一个 DF,但没有成功。
回答by MaxU
UPDATE:
更新:
a bit improved version of get_merged_csv()
function which can pass through parameters to pd.read_csv()
:
get_merged_csv()
函数的一个改进版本,可以将参数传递给pd.read_csv()
:
import os
import glob
import pandas as pd
def get_merged_csv(flist, **kwargs):
return pd.concat([pd.read_csv(f, **kwargs) for f in flist], ignore_index=True)
path = 'C:/Users/csvfiles'
fmask = os.path.join(path, '*nba*.csv')
df = get_merged_csv(glob.glob(fmask), index_col=None, usecols=['rank', 'name'])
print(df.head())
OLD version:
旧版本:
import os
import glob
import pandas as pd
path = 'C:/Users/csvfiles'
#fmask = '*.csv'
def get_merged_csv(path, fmask):
return pd.concat([pd.read_csv(f, index_col=None, header=0)
for f in glob.glob(os.path.join(path, fmask))]
)
df_list = [get_merged_csv(path, fmask)
for fmask in ['*nba.csv', '*basketball.csv', '*soccer.csv']]
df_list
will have three DFs: df_list[0]
- NBA, df_list[1]
- basketball, df_list[1]
- soccer
df_list
将有三个 DF:df_list[0]
- NBA,df_list[1]
- 篮球,df_list[1]
- 足球
alternatively you can put them into a dictionary:
或者,您可以将它们放入字典中:
df_dict = {}
df_dict['nba'] = get_merged_csv(path, '*nba.csv')
df_dict['basketball'] = get_merged_csv(path, '*basketball.csv')
df_dict['soccer'] = get_merged_csv(path, '*soccer.csv')
Some explanations:
一些解释:
get_merged_csv(path, fmask)
function reads CSV files in the list comprehension
loop, this list of DFs will be passed to the pd.concat()
function which will return single concatenated DF
get_merged_csv(path, fmask)
函数在list comprehension
循环中读取 CSV 文件,此pd.concat()
DF列表将传递给将返回单个串联 DF的函数