将多个excel文件导入python pandas并将它们连接成一个数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20908018/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Import multiple excel files into python pandas and concatenate them into one dataframe
提问by jonas
I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe. I have not been able to figure it out though. I need some help with the for loop and building a concatenated dataframe: Here is what I have so far:
我想将目录中的几个 excel 文件读入 Pandas 并将它们连接成一个大数据框。我一直无法弄清楚。我需要一些有关 for 循环和构建连接数据框的帮助:这是我目前所拥有的:
import sys
import csv
import glob
import pandas as pd
# get data file names
path =r'C:\DRO\DCL_rawdata_files\excelfiles'
filenames = glob.glob(path + "/*.xlsx")
dfs = []
for df in dfs:
xl_file = pd.ExcelFile(filenames)
df=xl_file.parse('Sheet1')
dfs.concat(df, ignore_index=True)
采纳答案by ericmjl
As mentioned in the comments, one error you are making is that you are looping over an empty list.
正如评论中提到的,您犯的一个错误是您正在遍历一个空列表。
Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.
下面是我将如何做到这一点,使用一个例子,有 5 个相同的 Excel 文件一个接一个地附加。
(1) Imports:
(1) 进口:
import os
import pandas as pd
(2) List files:
(2) 列出文件:
path = os.getcwd()
files = os.listdir(path)
files
Output:
输出:
['.DS_Store',
'.ipynb_checkpoints',
'.localized',
'Screen Shot 2013-12-28 at 7.15.45 PM.png',
'test1 2.xls',
'test1 3.xls',
'test1 4.xls',
'test1 5.xls',
'test1.xls',
'Untitled0.ipynb',
'Werewolf Modelling',
'~$Random Numbers.xlsx']
(3) Pick out 'xls' files:
(3) 挑选出 'xls' 文件:
files_xls = [f for f in files if f[-3:] == 'xls']
files_xls
Output:
输出:
['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']
(4) Initialize empty dataframe:
(4) 初始化空数据帧:
df = pd.DataFrame()
(5) Loop over list of files to append to empty dataframe:
(5) 循环文件列表以附加到空数据帧:
for f in files_xls:
data = pd.read_excel(f, 'Sheet1')
df = df.append(data)
(6) Enjoy your new dataframe. :-)
(6) 享受您的新数据框。:-)
df
Output:
输出:
Result Sample
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
5 f 6
6 g 7
7 h 8
8 i 9
9 j 10
回答by john blue
this works with python 2.x
这适用于 python 2.x
be in the directory where the Excel files are
位于 Excel 文件所在的目录中
see http://pbpython.com/excel-file-combine.html
见http://pbpython.com/excel-file-combine.html
import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()
回答by Tarun Bhavnani
import pandas as pd
import os
os.chdir('...')
#read first file for column names
fdf= pd.read_excel("first_file.xlsx", sheet_name="sheet_name")
#create counter to segregate the different file's data
fdf["counter"]=1
nm= list(fdf)
c=2
#read first 1000 files
for i in os.listdir():
print(c)
if c<1001:
if "xlsx" in i:
df= pd.read_excel(i, sheet_name="sheet_name")
df["counter"]=c
if list(df)==nm:
fdf=fdf.append(df)
c+=1
else:
print("headers name not match")
else:
print("not xlsx")
fdf=fdf.reset_index(drop=True)
#relax
回答by aruna kumar
import pandas as pd
import os
files = [file for file in os.listdir('./Salesfolder')]
all_month_sales= pd.DataFrame()
for file in files
df= pd.read_csv("./Salesfolder/"+file)
all_months_data=pd.concat([all_months_sales,df])
all_months_data.to_csv("all_data.csv",index=False)
You can go and read all your .xls files from folder (Salesfolder in my case) and same for your local path. Using iteration through whcih you can put them into empty data frame and you can concatnate your data frame to this . I have also exported to another csv for all months data into one csv file
您可以从文件夹(在我的情况下为 Salesfolder)读取所有 .xls 文件,并且对于您的本地路径也是如此。通过 whcih 使用迭代,您可以将它们放入空数据框中,并且可以将您的数据框连接到 this 。我还将所有月份的数据导出到另一个 csv 到一个 csv 文件中

