pandas 使用pandas在python中循环多个excel文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37397037/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Looping through multiple excel files in python using pandas
提问by brandog
I know this type of question is asked all the time. But I am having trouble figuring out the best way to do this.
我知道这种类型的问题一直被问到。但我无法找出最好的方法来做到这一点。
I wrote a script that reformats a single excel file using pandas. It works great.
我编写了一个脚本,使用pandas重新格式化单个 excel 文件。它工作得很好。
Now I want to loop through multipleexcel files, preform the same reformat, and place the newly reformatted data from each excel sheet at the bottom, one after another.
现在我想遍历多个excel 文件,执行相同的重新格式化,并将每个 excel 表中新重新格式化的数据一个接一个地放在底部。
I believe the first step is to make a list of all excel files in the directory. There are so many different ways to do this so I am having trouble finding the best way.
我相信第一步是列出目录中的所有excel文件。有很多不同的方法可以做到这一点,所以我很难找到最好的方法。
Below is the code I currently using to import multiple .xlsx and create a list.
下面是我目前用来导入多个 .xlsx 并创建一个列表的代码。
import os
import glob
os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
print(FileList)
I am not sure if the previous glob code actually created the list that I need.
我不确定之前的 glob 代码是否真的创建了我需要的列表。
Then I have trouble understanding where to go from there.
The code below fails at pd.ExcelFile(File)
I beleive I am missing something....
然后我很难理解从那里去哪里。下面的代码失败了,pd.ExcelFile(File)
我相信我错过了一些东西......
# create for loop
for File in FileList:
for x in File:
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(File)
xlsx_file
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Data',header= None)
# select important rows,
df_NoHeader = df[4:]
#then It does some more reformatting.
'
Any help is greatly appreciated
任何帮助是极大的赞赏
回答by brandog
I solved my problem. Instead of using the glob function I used the os.listdir to read all my excel sheets, loop through each excel file, reformat, then append the final data to the end of the table.
我解决了我的问题。我没有使用 glob 函数,而是使用 os.listdir 来读取我所有的 excel 表,遍历每个 excel 文件,重新格式化,然后将最终数据附加到表的末尾。
#first create empty appended_data table to store the info.
appended_data = []
for WorkingFile in os.listdir('C:\ExcelFiles'):
if os.path.isfile(WorkingFile):
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(WorkingFile)
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('sheet1',header= None)
#.... do so reformating, call finished sheet reformatedDataSheet
reformatedDataSheet
appended_data.append(reformatedDataSheet)
appended_data = pd.concat(appended_data)
And thats it, it does everything I wanted.
就是这样,它完成了我想要的一切。
回答by Corley Brigman
you need to change
你需要改变
os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
print(FileList)
to just
只是
os.chdir('C:\ExcelWorkbooksFolder')
FileList = glob.glob('*.xlsx')
print(FileList)
Why does this fix it? glob
returns a single list. Since you put for FileList in glob.glob(...)
, you're going to walk that list one by one and put the result into FileList
. At the end of your loop, FileList
is a single filename - a single string.
为什么要解决这个问题?glob
返回单个列表。既然你把for FileList in glob.glob(...)
,你要一个一个地遍历那个列表并将结果放入FileList
。在循环结束时,FileList
是单个文件名 - 单个字符串。
When you do this code:
当您执行此代码时:
for File in FileList:
for x in File:
the first line will assign File
to the first character of the last filename (as a string). The second line will assign x
to the first (and only) character of File
. This is not likely to be a valid filename, so it throws an error.
第一行将分配File
给最后一个文件名的第一个字符(作为字符串)。第二行将分配x
给 . 的第一个(也是唯一一个)字符File
。这不太可能是有效的文件名,因此会引发错误。