Python 如何使用 Pandas 在现有的 excel 文件中保存新工作表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42370977/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:37:26  来源:igfitidea点击:

How to save a new sheet in an existing excel file, using Pandas?

pythonpandasopenpyxlxlsxwriter

提问by Stefano Fedele

I want to use excel files to store data elaborated with python. My problem is that I can't add sheets to an existing excel file. Here I suggest a sample code to work with in order to reach this issue

我想使用excel文件来存储用python详细说明的数据。我的问题是我无法将工作表添加到现有的 excel 文件中。在这里,我建议使用一个示例代码来解决此问题

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

This code saves two DataFrames to two sheets, named "x1" and "x2" respectively. If I create two new DataFrames and try to use the same code to add two new sheets, 'x3' and 'x4', the original data is lost.

此代码将两个 DataFrame 保存到两个工作表中,分别命名为“x1”和“x2”。如果我创建了两个新的 DataFrame 并尝试使用相同的代码添加两个新工作表“x3”和“x4”,则原始数据将丢失。

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

I want an excel file with four sheets: 'x1', 'x2', 'x3', 'x4'. I know that 'xlsxwriter' is not the only "engine", there is 'openpyxl'. I also saw there are already other people that have written about this issue, but still I can't understand how to do that.

我想要一个包含四张纸的 excel 文件:“x1”、“x2”、“x3”、“x4”。我知道“xlsxwriter”不是唯一的“引擎”,还有“openpyxl”。我也看到已经有其他人写过关于这个问题的文章,但我仍然不明白该怎么做。

Here a code taken from this link

这是从此链接中获取的代码

import pandas
from openpyxl import load_workbook

book = load_workbook('Masterfile.xlsx')
writer = pandas.ExcelWriter('Masterfile.xlsx', engine='openpyxl') 
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

data_filtered.to_excel(writer, "Main", cols=['Diff1', 'Diff2'])

writer.save()

They say that it works, but it is hard to figure out how. I don't understand what "ws.title", "ws", and "dict" are in this context.

他们说它有效,但很难弄清楚如何。我不明白在这种情况下“ws.title”、“ws”和“dict”是什么。

Which is the best way to save "x1" and "x2", then close the file, open it again and add "x3" and "x4"?

保存“x1”和“x2”,然后关闭文件,再次打开并添加“x3”和“x4”的最佳方法是什么?

回答by Stefano Fedele

Thank you. I believe that a complete example could be good for anyone else who have the same issue:

谢谢你。我相信一个完整的例子可能对其他有同样问题的人有好处:

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

x1 = np.random.randn(100, 2)
df1 = pd.DataFrame(x1)

x2 = np.random.randn(100, 2)
df2 = pd.DataFrame(x2)

writer = pd.ExcelWriter(path, engine = 'xlsxwriter')
df1.to_excel(writer, sheet_name = 'x1')
df2.to_excel(writer, sheet_name = 'x2')
writer.save()
writer.close()

Here I generate an excel file, from my understanding it does not really matter whether it is generated via the "xslxwriter" or the "openpyxl" engine.

这里我生成了一个excel文件,根据我的理解,它是通过“xslxwriter”还是“openpyxl”引擎生成的并不重要。

When I want to write without loosing the original data then

当我想在不丢失原始数据的情况下写入时

import pandas as pd
import numpy as np
from openpyxl import load_workbook

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

book = load_workbook(path)
writer = pd.ExcelWriter(path, engine = 'openpyxl')
writer.book = book

x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

df3.to_excel(writer, sheet_name = 'x3')
df4.to_excel(writer, sheet_name = 'x4')
writer.save()
writer.close()

this code do the job!

这段代码做的工作!

回答by Grr

In the example you shared you are loading the existing file into bookand setting the writer.bookvalue to be book. In the line writer.sheets = dict((ws.title, ws) for ws in book.worksheets)you are accessing each sheet in the workbook as ws. The sheet title is then wsso you are creating a dictionary of {sheet_titles: sheet}key, value pairs. This dictionary is then set to writer.sheets. Essentially these steps are just loading the existing data from 'Masterfile.xlsx'and populating your writer with them.

在您共享的示例中,您将现有文件加载到其中book并将writer.book值设置为book. 在该行中,writer.sheets = dict((ws.title, ws) for ws in book.worksheets)您将工作簿中的每个工作表作为ws. 工作表标题是ws这样你正在创建一个{sheet_titles: sheet}键值对的字典。然后将该词典设置为 writer.sheets。本质上,这些步骤只是加载现有数据'Masterfile.xlsx'并用它们填充您的编写器。

Now let's say you already have a file with x1and x2as sheets. You can use the example code to load the file and then could do something like this to add x3and x4.

现在假设您已经有一个包含x1x2作为工作表的文件。您可以使用示例代码加载文件,然后可以执行类似的操作来添加x3x4

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"
writer = pd.ExcelWriter(path, engine='openpyxl')
df3.to_excel(writer, 'x3', index=False)
df4.to_excel(writer, 'x4', index=False)
writer.save()

That should do what you are looking for.

那应该做你正在寻找的。

回答by Wong Tat Yau

A simple example for writing multiple data to excel at a time. And also when you want to append data to a sheet on a written excel file (closed excel file).

一次将多个数据写入 excel 的简单示例。以及当您想将数据附加到书面 excel 文件(关闭的 excel 文件)上的工作表时。

When it is your first time writing to an excel. (Writing "df1" and "df2" to "1st_sheet" and "2nd_sheet")

当这是您第一次写入 Excel 时。(将“df1”和“df2”写入“1st_sheet”和“2nd_sheet”)

import pandas as pd 
from openpyxl import load_workbook

df1 = pd.DataFrame([[1],[1]], columns=['a'])
df2 = pd.DataFrame([[2],[2]], columns=['b'])
df3 = pd.DataFrame([[3],[3]], columns=['c'])

excel_dir = "my/excel/dir"

with pd.ExcelWriter(excel_dir, engine='xlsxwriter') as writer:    
    df1.to_excel(writer, '1st_sheet')   
    df2.to_excel(writer, '2nd_sheet')   
    writer.save()    

After you close your excel, but you wish to "append" data on the same excel file but another sheet, let's say "df3" to sheet name "3rd_sheet".

关闭 excel 后,但您希望将数据“附加”到同一个 excel 文件但另一个工作表上,让我们说“df3”到工作表名称“3rd_sheet”。

book = load_workbook(excel_dir)
with pd.ExcelWriter(excel_dir, engine='openpyxl') as writer:
    writer.book = book
    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)    

    ## Your dataframe to append. 
    df3.to_excel(writer, '3rd_sheet')  

    writer.save()     

Be noted that excel format must not be xls, you may use xlsx one.

需要注意的是excel格式不能是xls,你可以用xlsx之一。

回答by Charlie Clark

I would strongly recommend you work directly with openpyxl since it now supports Pandas DataFrames.

我强烈建议您直接使用openpyxl,因为它现在支持 Pandas DataFrames

This allows you to concentrate on the relevant Excel and Pandas code.

这使您可以专注于相关的 Excel 和 Pandas 代码。

回答by Jonathan L

You can read existing sheets of your interests, for example, 'x1', 'x2', into memory and 'write' them back prior to adding more new sheets (keep in mind that sheets in a file and sheets in memory are two different things, if you don't read them, they will be lost). This approach uses 'xlsxwriter' only, no openpyxl involved.

您可以将您感兴趣的现有工作表(例如,“x1”、“x2”)读入内存并在添加更多新工作表之前将它们“写回”(请记住,文件中的工作表和内存中的工作表是两种不同的东西,如果你不读它们,它们就会丢失)。此方法仅使用“xlsxwriter”,不涉及 openpyxl。

import pandas as pd
import numpy as np

path = r"C:\Users\fedel\Desktop\excelData\PhD_data.xlsx"

# begin <== read selected sheets and write them back
df1 = pd.read_excel(path, sheet_name='x1', index_col=0) # or sheet_name=0
df2 = pd.read_excel(path, sheet_name='x2', index_col=0) # or sheet_name=1
writer = pd.ExcelWriter(path, engine='xlsxwriter')
df1.to_excel(writer, sheet_name='x1')
df2.to_excel(writer, sheet_name='x2')
# end ==>

# now create more new sheets
x3 = np.random.randn(100, 2)
df3 = pd.DataFrame(x3)

x4 = np.random.randn(100, 2)
df4 = pd.DataFrame(x4)

df3.to_excel(writer, sheet_name='x3')
df4.to_excel(writer, sheet_name='x4')
writer.save()
writer.close()

If you want to preserve all existing sheets, you can replace above code between begin and end with:

如果要保留所有现有工作表,可以将开始和结束之间的上述代码替换为:

# read all existing sheets and write them back
writer = pd.ExcelWriter(path, engine='xlsxwriter')
xlsx = pd.ExcelFile(path)
for sheet in xlsx.sheet_names:
    df = xlsx.parse(sheet_name=sheet, index_col=0)
    df.to_excel(writer, sheet_name=sheet)

回答by nileshk611

#This program is to read from excel workbook to fetch only the URL domain names and write to the existing excel workbook in a different sheet..
#Developer - Nilesh K
import pandas as pd
from openpyxl import load_workbook #for writting to the existing workbook

df = pd.read_excel("urlsearch_test.xlsx")

#You can use the below for the relative path.
# r"C:\Users\xyz\Desktop\Python\

l = [] #To make a list in for loop

#begin
#loop starts here for fetching http from a string and iterate thru the entire sheet. You can have your own logic here.
for index, row in df.iterrows():
    try: 
        str = (row['TEXT']) #string to read and iterate
        y = (index)
        str_pos = str.index('http') #fetched the index position for http
        str_pos1 = str.index('/', str.index('/')+2) #fetched the second 3rd position of / starting from http
        str_op = str[str_pos:str_pos1] #Substring the domain name
        l.append(str_op) #append the list with domain names

    #Error handling to skip the error rows and continue.
    except ValueError:
            print('Error!')
print(l)
l = list(dict.fromkeys(l)) #Keep distinct values, you can comment this line to get all the values
df1 = pd.DataFrame(l,columns=['URL']) #Create dataframe using the list
#end

#Write using openpyxl so it can be written to same workbook
book = load_workbook('urlsearch_test.xlsx')
writer = pd.ExcelWriter('urlsearch_test.xlsx',engine = 'openpyxl')
writer.book = book
df1.to_excel(writer,sheet_name = 'Sheet3')
writer.save()
writer.close()

#The below can be used to write to a different workbook without using openpyxl
#df1.to_excel(r"C:\Users\xyz\Desktop\Python\urlsearch1_test.xlsx",index='false',sheet_name='sheet1')

回答by Jis Mathew

Can do it without using ExcelWriter, using tools in openpyxl This can make adding fonts to the new sheet much easier using openpyxl.styles

可以在不使用 ExcelWriter 的情况下使用 openpyxl 中的工具来完成这可以使用 openpyxl.styles

import pandas as pd
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

#Location of original excel sheet
fileLocation =r'C:\workspace\data.xlsx'

#Location of new file which can be the same as original file
writeLocation=r'C:\workspace\dataNew.xlsx'

data = {'Name':['Tom','Paul','Jeremy'],'Age':[32,43,34],'Salary':[20000,34000,32000]}

#The dataframe you want to add
df = pd.DataFrame(data)

#Load existing sheet as it is
book = load_workbook(fileLocation)
#create a new sheet
sheet = book.create_sheet("Sheet Name")

#Load dataframe into new sheet
for row in dataframe_to_rows(df, index=False, header=True):
    sheet.append(row)

#Save the modified excel at desired location    
book.save(writeLocation)

回答by MrMajestyk

Another fairly simple way to go about this is to make a method like this:

另一种相当简单的方法是创建一个这样的方法:

def _write_frame_to_new_sheet(path_to_file=None, sheet_name='sheet', data_frame=None):
    book = None
    try:
        book = load_workbook(path_to_file)
    except Exception:
        logging.debug('Creating new workbook at %s', path_to_file)
    with pd.ExcelWriter(path_to_file, engine='openpyxl') as writer:
        if book is not None:
            writer.book = book
        data_frame.to_excel(writer, sheet_name, index=False)

The idea here is to load the workbook at path_to_fileif it exists and then append the data_frameas a new sheet with sheet_name. If the workbook does not exist, it is created. It seems that neither openpyxlor xlsxwriterappend, so as in the example by @Stefano above, you really have to load and then rewrite to append.

这里的想法是在path_to_file加载工作簿(如果存在),然后将data_frame作为带有sheet_name的新工作表附加。如果工作簿不存在,则会创建它。似乎openpyxlxlsxwriter都没有追加,因此在上面@Stefano 的示例中,您确实必须加载然后重写才能追加。