pandas 使用pandas在python中循环多个excel文件

Question

提问by brandog

I know this type of question is asked all the time. But I am having trouble figuring out the best way to do this.

我知道这种类型的问题一直被问到。但我无法找出最好的方法来做到这一点。

I wrote a script that reformats a single excel file using pandas. It works great.

我编写了一个脚本，使用pandas重新格式化单个 excel 文件。它工作得很好。

Now I want to loop through multipleexcel files, preform the same reformat, and place the newly reformatted data from each excel sheet at the bottom, one after another.

现在我想遍历多个excel 文件，执行相同的重新格式化，并将每个 excel 表中新重新格式化的数据一个接一个地放在底部。

I believe the first step is to make a list of all excel files in the directory. There are so many different ways to do this so I am having trouble finding the best way.

我相信第一步是列出目录中的所有excel文件。有很多不同的方法可以做到这一点，所以我很难找到最好的方法。

Below is the code I currently using to import multiple .xlsx and create a list.

下面是我目前用来导入多个 .xlsx 并创建一个列表的代码。

import os
import glob

os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
         print(FileList)

I am not sure if the previous glob code actually created the list that I need.

我不确定之前的 glob 代码是否真的创建了我需要的列表。

Then I have trouble understanding where to go from there. The code below fails at pd.ExcelFile(File)I beleive I am missing something....

然后我很难理解从那里去哪里。下面的代码失败了，pd.ExcelFile(File)我相信我错过了一些东西......

# create for loop
for File in FileList:
    for x in File:
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(File)
xlsx_file
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Data',header= None)
# select important rows,
df_NoHeader = df[4:]
#then It does some more reformatting.
'

Any help is greatly appreciated

任何帮助是极大的赞赏

Answer 1

回答by brandog

I solved my problem. Instead of using the glob function I used the os.listdir to read all my excel sheets, loop through each excel file, reformat, then append the final data to the end of the table.

我解决了我的问题。我没有使用 glob 函数，而是使用 os.listdir 来读取我所有的 excel 表，遍历每个 excel 文件，重新格式化，然后将最终数据附加到表的末尾。

#first create empty appended_data table to store the info.
appended_data = []


for WorkingFile in os.listdir('C:\ExcelFiles'):
     if os.path.isfile(WorkingFile):

        # Import the excel file and call it xlsx_file
        xlsx_file = pd.ExcelFile(WorkingFile)
        # View the excel files sheet names
        xlsx_file.sheet_names
        # Load the xlsx files Data sheet as a dataframe
        df = xlsx_file.parse('sheet1',header= None)

        #.... do so reformating, call finished sheet reformatedDataSheet
        reformatedDataSheet
        appended_data.append(reformatedDataSheet)
appended_data = pd.concat(appended_data)

And thats it, it does everything I wanted.

就是这样，它完成了我想要的一切。

Answer 2

回答by Corley Brigman

you need to change

你需要改变

os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
         print(FileList)

to just

只是

os.chdir('C:\ExcelWorkbooksFolder')
FileList = glob.glob('*.xlsx')
print(FileList)

Why does this fix it? globreturns a single list. Since you put for FileList in glob.glob(...), you're going to walk that list one by one and put the result into FileList. At the end of your loop, FileListis a single filename - a single string.

为什么要解决这个问题？glob返回单个列表。既然你把for FileList in glob.glob(...)，你要一个一个地遍历那个列表并将结果放入FileList。在循环结束时，FileList是单个文件名 - 单个字符串。

When you do this code:

当您执行此代码时：

for File in FileList:
    for x in File:

the first line will assign Fileto the first character of the last filename (as a string). The second line will assign xto the first (and only) character of File. This is not likely to be a valid filename, so it throws an error.

第一行将分配File给最后一个文件名的第一个字符（作为字符串）。第二行将分配x给 . 的第一个（也是唯一一个）字符File。这不太可能是有效的文件名，因此会引发错误。

pandas 使用pandas在python中循环多个excel文件

提问by brandog

回答by brandog

回答by Corley Brigman

相关推荐

最近更新

标签

pandas 使用pandas在python中循环多个excel文件

提问by brandog

回答by brandog

回答by Corley Brigman

相关推荐

pandas 重命名熊猫系列中的索引？

pandas 大熊猫用 NaT 替换 NaN

pandas 熊猫键盘错误

无法将字符串转换为浮点数 - Pandas 读取列

相关推荐

最近更新

标签