使用 python pandas 将现有的 excel 表附加到新的数据框

Question

提问by brandog

I currently have this code. It works perfectly.

我目前有这个代码。它完美地工作。

It loops through excel files in a folder, removes the first 2 rows, then saves them as individual excel files, and it also saves the files in the loop as an appended file.

它遍历文件夹中的 excel 文件，删除前 2 行，然后将它们另存为单独的 excel 文件，并将循环中的文件另存为附加文件。

Currently the appended file overwritesthe existing file each time I run the code.

目前，每次运行代码时，附加文件都会覆盖现有文件。

I need to append the new data to the bottom of the already existing excel sheet('master_data.xlsx)

我需要将新数据附加到已经存在的 Excel 表('master_data.xlsx)的底部

dfList = []
path = 'C:\Test\TestRawFile' 
newpath = 'C:\Path\To\New\Folder'

for fn in os.listdir(path): 
  # Absolute file path
  file = os.path.join(path, fn)
  if os.path.isfile(file): 
    # Import the excel file and call it xlsx_file 
    xlsx_file = pd.ExcelFile(file) 
    # View the excel files sheet names 
    xlsx_file.sheet_names 
    # Load the xlsx files Data sheet as a dataframe 
    df = xlsx_file.parse('Sheet1',header= None) 
    df_NoHeader = df[2:] 
    data = df_NoHeader 
    # Save individual dataframe
    data.to_excel(os.path.join(newpath, fn))

    dfList.append(data) 

appended_data = pd.concat(dfList)
appended_data.to_excel(os.path.join(newpath, 'master_data.xlsx'))

I thought this would be a simple task, but I guess not. I think I need to bring in the master_data.xlsx file as a dataframe, then match the index up with the new appended data, and save it back out. Or maybe there is an easier way. Any Help is appreciated.

我认为这将是一项简单的任务，但我想不是。我想我需要将 master_data.xlsx 文件作为数据帧引入，然后将索引与新的附加数据进行匹配，并将其保存回来。或者也许有更简单的方法。任何帮助表示赞赏。

Answer 1

回答by MaxU

A helper function for appending DataFrame to existingExcel file:

用于将 DataFrame 附加到现有Excel 文件的辅助函数：

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]

    Returns: None
    """
    from openpyxl import load_workbook

    import pandas as pd

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl')

    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

Usage examples...

用法示例...

Old answer:it allows you to write a severalDataFrames to a new Excel file.

旧答案：它允许您将多个DataFrame写入一个新的 Excel 文件。

You can use openpyxlengine in conjunction with startrowparameter:

您可以将openpyxl引擎与startrow参数结合使用：

In [48]: writer = pd.ExcelWriter('c:/temp/test.xlsx', engine='openpyxl')

In [49]: df.to_excel(writer, index=False)

In [50]: df.to_excel(writer, startrow=len(df)+2, index=False)

In [51]: writer.save()

c:/temp/test.xlsx:

PS you may also want to specify header=Noneif you don't want to duplicate column names...

PS，header=None如果您不想重复列名，您可能还想指定...

UPDATE:you may also want to check this solution

更新：您可能还想检查此解决方案

Answer 2

回答by David

If you aren't strictly looking for an excel file, then get the output as csv file and just copy the csv to a new excel file

如果您不是严格寻找 excel 文件，则将输出作为 csv 文件，然后将 csv 复制到新的 excel 文件

df.to_csv('filepath', mode='a', index = False, header=None)

mode = 'a'

模式 = 'a'

a means append

一种手段追加

This is a roundabout way but works neat!

这是一种迂回的方式，但工作得很好！

Answer 3

回答by brandog

This question has been out here a while. The answer is ok, but I believe this will solve most peoples question.

这个问题已经有一段时间了。答案是可以的，但我相信这将解决大多数人的问题。

simply use glob to access the files in a specific directory, loop through them, create a dataframe of each file, append it to the last one, then export to a folder. I also included commented out code to run through this with csvs.

只需使用 glob 访问特定目录中的文件，遍历它们，创建每个文件的数据框，将其附加到最后一个，然后导出到文件夹。我还包括注释掉的代码，以使用 csvs 来运行它。

import os
import pandas as pd
import glob

# put in path to folder with files you want to append
# *.xlsx or *.csv will get all files of that type
path = "C:/Users/Name/Folder/*.xlsx"
#path = "C:/Users/Name/Folder/*.csv"

# initialize a empty df
appended_data = pd.DataFrame()

#loop through each file in the path
for file in glob.glob(path):
    print(file)

    # create a df of that file path
    df = pd.read_excel(file, sheet_name = 0)
    #df = pd.read_csv(file, sep=',')

    # appened it
    appended_data = appended_data.append(df)

appended_data

# export the appeneded data to a folder of your choice
exportPath = 'C:/My/EXPORT/PATH/appended_dataExport.csv'
appended_data.to_csv(os.path.join(exportPath),index=False)

使用 python pandas 将现有的 excel 表附加到新的数据框

提问by brandog

回答by MaxU

回答by David

回答by brandog

相关推荐

最近更新

标签

使用 python pandas 将现有的 excel 表附加到新的数据框

提问by brandog

回答by MaxU

回答by David

回答by brandog

相关推荐

Python 在 OpenCV 中标准化图像

python - 如何在条形图顶部显示值

如何在 Ubuntu 18.04 上安装 python3.7 并使用 pip 创建一个 virtualenv？

Beautifulsoup：.find() 和 .select() 之间有区别吗 - python 3.xx

相关推荐

最近更新

标签