Pandas 能否在不修改文件的其余部分的情况下读取和修改单个 Excel 文件工作表(选项卡)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28142420/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file?
提问by HaPsantran
Many spreadsheets have formulas and formatting that Python tools for reading and writing Excel files cannot faithfully reproduce. That means that any file I want to create programmatically must be something I basically create from scratch, and then other Excel files (with the aforementioned sophistication) have to refer to that file (which creates a variety of other dependency issues).
许多电子表格具有用于读写 Excel 文件的 Python 工具无法忠实再现的公式和格式。这意味着我想以编程方式创建的任何文件都必须是我基本上从头开始创建的文件,然后其他 Excel 文件(具有上述复杂性)必须引用该文件(这会产生各种其他依赖性问题)。
My understanding of Excel file 'tabs' is that they're actually just a collection of XML files. Well, is it possible to use pandas (or one of the underlying read/write engines such as xlsxwriter or openpyxl to modify just one of the tabs, leaving other tabs (with more wicked stuff in there) intact?
我对 Excel 文件“选项卡”的理解是,它们实际上只是 XML 文件的集合。那么,是否可以使用 Pandas(或一种底层读/写引擎,如 xlsxwriter 或 openpyxl 来修改其中一个选项卡,而保留其他选项卡(其中包含更多邪恶的东西)完整无缺?
EDIT: I'll try to further articulate the problem with an example.
编辑:我将尝试用一个例子进一步阐明这个问题。
- Excel Sheet test.xlsx has four tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4
- I read Sheet3 into a DataFrame (let's call it df) using pandas.read_excel()
- Sheet1 and Sheet2 contain formulas, graphs, and various formatting that neither openpyxl nor xlrd can successfully parse, and Sheet4 contains other data. I don't want to touch those tabs at all.
- Sheet2 actually has some references to cells on Sheet3
- I make some edits to df and now want to write it back to sheet3, leaving the other sheets untouched (and the references to it from other worksheets in the workbook intact)
- Excel Sheet test.xlsx 有四个选项卡(又名工作表):Sheet1、Sheet2、Sheet3、Sheet4
- 我使用 pandas.read_excel() 将 Sheet3 读入 DataFrame(我们称之为 df)
- Sheet1 和 Sheet2 包含 openpyxl 和 xlrd 都无法成功解析的公式、图形和各种格式,而 Sheet4 包含其他数据。我根本不想碰这些标签。
- Sheet2 实际上有一些对 Sheet3 上单元格的引用
- 我对 df 进行了一些编辑,现在想将它写回 sheet3,保持其他工作表不变(以及工作簿中其他工作表对它的引用完好无损)
Can I do that and, if so, how?
我可以这样做吗,如果可以,怎么做?
回答by PaulDong
I had a similar question regarding the interaction between excel and python (in particular, pandas), and I was referred to this question.
我有一个关于excel和python(特别是pandas)之间交互的类似问题,我被提到了这个问题。
Thanks to some pointers by stackoverflow community, I found a package called xlwingsthat seems to cover a lot of the functionalities HaPsantran required.
感谢 stackoverflow 社区的一些提示,我找到了一个名为xlwings的包,它似乎涵盖了 HaPsantran 所需的许多功能。
To use the OP's example:
使用 OP 的示例:
Working with an existing excel file, you can drop an anchor in the data block (Sheet3) you want to import to pandas by naming it in excel and do:
使用现有的 excel 文件,您可以在要导入到 Pandas 的数据块 (Sheet3) 中放置一个锚点,方法是在 excel 中对其进行命名,然后执行以下操作:
# opened an existing excel file
wb = Workbook(Existing_file)
wb = Workbook(Existing_file)
# Find in the excel file a named cell and reach the boundary of the cell block (boundary defined by empty column / row) and read the cell
df = Range(Anchor).table.value
df = Range(Anchor).table.value
# import pandas and manipulate the data block
df = pd.DataFrame(df) # into Pandas DataFrame
df['sum'] = df.sum(axis= 1)
# write back to Sheet3
Range(Anchor).value = df.values
tested that this implementation didn't temper existing formula in the excel file
测试此实现不会影响 excel 文件中的现有公式
Let me know if this solves your problem and if there's anything I can help.
让我知道这是否能解决您的问题,如果有什么我可以帮忙的。
Big kudos to the developer of xlwings, they made this possible.
非常感谢 xlwings 的开发人员,他们使这成为可能。
Below is an update to my earlier answer after further question from @jamzsabb, and to reflect a changed API after xlwings updated to >= 0.9.0.
以下是在@jamzsabb 提出进一步问题后对我之前回答的更新,并反映在 xlwings 更新到 >= 0.9.0 后更改的 API。
import xlwings as xw
import pandas as pd
target_df = xw.Range('A7').options(pd.DataFrame, expand='table').value # only do this if the 'A7' cell (the cell within area of interest) is in active worksheet
#otherwise do:
#sht = xw.Book(r'path to your xlxs file\name_of_file.xlsx`).sheets['name of sheet']
#target_df = sht.Range('A7').options(pd.DataFrame, expand='table').value # you can also change 'A7' to any name that you've given to a cell like 'interest_table`
回答by DSM
I'm 90% confident the answer to "can pandasdo this" is no. Posting a negative is tough, because there always might be something clever that I've missed, but here's a case:
我 90% 有信心“可以pandas做到这一点”的答案是否定的。发布负面消息很困难,因为总有一些我遗漏的聪明之处,但这里有一个案例:
Possible interface engines are xlrd/xlwt/xlutils, openpyxl, and xlsxwriter. None will work for your purposes, as xlrd/wtdon't support all formulae, xlsxwritercan't modify existing xlsxfiles, and openpyxlloses images and charts.
可能的界面引擎xlrd/xlwt/xlutils,openpyxl和xlsxwriter。没有一个将适用于您的目的,因为xlrd/wt不支持所有公式,xlsxwriter无法修改现有xlsx文件,并且openpyxl会丢失图像和图表。
Since I often need to do this, I've taken to only writing simple output to a separate file and then calling the win32api directly to copy the data between the workbooks while preserving all of my colleague's shiny figures. It's annoying, because it means I have to do it under Windows instead of *nix, but it works.
由于我经常需要这样做,因此我只将简单的输出写入单独的文件,然后直接调用 win32api 以在工作簿之间复制数据,同时保留所有同事的闪亮数字。这很烦人,因为这意味着我必须在 Windows 而不是 *nix 下执行此操作,但它有效。
If you're working under Windows, you could do something similar. (I wonder if it makes sense to add a native insert option using this approach to help people in this situation, or if we should simply post a recipe.)
如果你在 Windows 下工作,你可以做类似的事情。(我想知道使用这种方法添加本机插入选项来帮助遇到这种情况的人是否有意义,或者我们是否应该简单地发布一个食谱。)
P.S.: This very problem has annoyed me enough from time to time that I've thought of learning enough of the modern Excel format to add support for this to one of the libraries.
PS:这个问题不时让我很恼火,以至于我想学习足够的现代 Excel 格式来为其中一个库添加对此的支持。
P.P.S.: But since ignoring things you're not handling and returning them unmodified seems easy enough, the fact that no one seems to support it makes me think there are some headaches, and where Redmond's involved I'm willing to believe it. @john-machin would know the details, if he's about..
PPS:但是因为忽略你没有处理的事情并将它们原封不动地返回似乎很容易,似乎没有人支持这一事实让我觉得有些头疼,而雷德蒙德涉及的地方我愿意相信。@john-machin 会知道细节,如果他是关于..
回答by patrickjlong1
I'm adding an answer that uses openpyxl. As of version 2.5, you can preserve charts in existing files (further details on the issue are available here).
我正在添加一个使用 openpyxl 的答案。从 2.5 版开始,您可以在现有文件中保留图表(有关该问题的更多详细信息可在此处获得)。
For demonstration purposes, I create an xlsx file using pandas following the OPs guidelines. The tab named 'Sheet2' has formulas that reference 'Sheet3' and contains a chart.
出于演示目的,我按照 OP 指南使用 Pandas 创建了一个 xlsx 文件。名为“Sheet2”的选项卡具有引用“Sheet3”的公式并包含一个图表。
import pandas as pd
df = pd.DataFrame({'col_a': [1,2,3],
'col_b': [4,5,6]})
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
workbook=writer.book
worksheet = writer.sheets['Sheet1']
df.head(0).to_excel(writer, sheet_name='Sheet2', index=False)
workbook=writer.book
worksheet = writer.sheets['Sheet2']
for i in range(2, len(df) + 2):
worksheet.write_formula('A%d' % (i), "=Sheet3!A%d" % (i))
worksheet.write_formula('B%d' % (i), "=Sheet3!B%d" % (i))
chart = workbook.add_chart({'type': 'column'})
chart.add_series({'values': '=Sheet2!$A:$A'})
chart.add_series({'values': '=Sheet2!$B:$B'})
worksheet.insert_chart('A7', chart)
df.to_excel(writer, sheet_name='Sheet3', index=False)
df.to_excel(writer, sheet_name='Sheet4', index=False)
writer.save()
Expected test.xlsx after running the code above:
运行上面的代码后预期的 test.xlsx:
Then if we run the code below, using openpyxl, we can modify the data in 'Sheet3' while preserving formulas and chart in 'Sheet2' and the updated data is now in this file.
然后,如果我们使用 openpyxl 运行下面的代码,我们可以修改“Sheet3”中的数据,同时保留“Sheet2”中的公式和图表,更新后的数据现在在此文件中。
from openpyxl import load_workbook
wb = load_workbook('test.xlsx')
ws = wb['Sheet3']
ws['B2'] = 7
ws['B3'] = 8
ws['B4'] = 9
wb.save('test.xlsx')
Expected test.xlsx after running the second block of code:
运行第二个代码块后预期的 test.xlsx:
回答by Julien Kervizic
As far as I know Pandas does not do that by itself.
据我所知,Pandas 本身并不会这样做。
I wrote some small utility library pandasxltable(based on openpyxl) in order to facilitate the interaction between a excel template and pandas data-frames. The library allows you to fetch as data-frame and update Excel Data Tables (not really a tab but part of it)from dataframe.
我编写了一些小型实用程序库pandasxltable(基于openpyxl)以促进excel模板和pandas数据框之间的交互。该库允许您从数据框中获取数据框并更新 Excel 数据表(不是真正的选项卡,而是它的一部分)。
回答by adrianX
if you're talking about 'sheets' as 'tabs', then it is possible to modify just one of the tabs by accessing the particular one using the parse(sheet_name)function.
如果您将“表格”称为“选项卡”,则可以通过使用该parse(sheet_name)功能访问特定选项卡来仅修改其中一个选项卡。
an example is here: Reading an Excel file in python using pandas
一个例子在这里: Reading an Excel file in python using pandas
to write back to excel, (while controlling the sheets) use the to_excelfunction, here:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html
写回 excel,(同时控制工作表)使用该to_excel函数,这里:http:
//pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html


