使用 python xlrd 从 Excel 单元格获取公式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4690423/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get formula from Excel cell with python xlrd
提问by alexroat
I have to port an algorithm from an Excel sheet to python codebut I have to reverse engineer the algorithm from the Excel file.
我必须将算法从 Excel 工作表移植到 python 代码,但我必须从 Excel 文件对算法进行逆向工程。
The Excel sheet is quite complicated, it contains many cells in which there are formulas that refer to other cells (that can also contains a formula or a constant).
Excel 工作表非常复杂,它包含许多单元格,其中有引用其他单元格的公式(也可以包含公式或常量)。
My idea is to analyze with a python script the sheet building a sort of table of dependencies between cells, that is:
我的想法是用 python 脚本分析工作表,在单元格之间建立一种依赖关系表,即:
A1 depends on B4,C5,E7 formula: "=sqrt(B4)+C5*E7"
A2 depends on B5,C6 formula: "=sin(B5)*C6"
...
A1 取决于 B4,C5,E7 公式:“=sqrt(B4)+C5*E7”
A2 取决于 B5,C6 公式:“=sin(B5)*C6”
...
The xlrdpython module allows to read an XLS workbook but at the moment I can access to the valueof a cell, not the formula.
所述xlrd蟒模块允许读取XLS工作簿但此刻我可以访问到的值的单元,而不是的公式。
For example, with the following code I can get simply the value of a cell:
例如,使用以下代码,我可以简单地获取单元格的值:
import xlrd
#open the .xls file
xlsname="test.xls"
book = xlrd.open_workbook(xlsname)
#build a dictionary of the names->sheets of the book
sd={}
for s in book.sheets():
sd[s.name]=s
#obtain Sheet "Foglio 1" from sheet names dictionary
sheet=sd["Foglio 1"]
#print value of the cell J141
print sheet.cell(142,9)
Anyway, It seems to have no way to get the formul from the Cell object returned by the .cell(...)method. In documentationthey say that it is possible to get a string version of the formula (in english because there is no information about function name translation stored in the Excel file). They speak about formulas (expressions) in the Nameand Operandclasses, anyway I cannot understand how to get the instances of these classes by the Cellclass instance that must contains them.
无论如何,它似乎无法从.cell(...)方法返回的 Cell 对象中获取公式。在文档中,他们说可以获取公式的字符串版本(使用英语,因为 Excel 文件中没有存储有关函数名称转换的信息)。他们谈论Name和Operand类中的公式(表达式),无论如何我无法理解如何通过必须包含它们的Cell类实例来获取这些类的实例。
Could you suggest a code snippet that gets the formula text from a cell?
你能建议一个从单元格中获取公式文本的代码片段吗?
采纳答案by John Machin
[Dis]claimer: I'm the author/maintainer of xlrd.
[免责声明]:我是xlrd.
The documentation references to formula text are about "name" formulas; read the section "Named references, constants, formulas, and macros" near the start of the docs. These formulas are associated sheet-wide or book-wide to a name; they are not associated with individual cells. Examples: PImaps to =22/7, SALESmaps to =Mktng!$A$2:$Z$99. The name-formula decompiler was written to support inspection of the simpler and/or commonly found usages of defined names.
对公式文本的文档参考是关于“名称”公式的;阅读文档开头附近的“命名引用、常量、公式和宏”部分。这些公式在工作表范围或书范围内与名称相关联;它们与单个细胞无关。示例:PI映射到=22/7,SALES映射到=Mktng!$A$2:$Z$99。编写名称公式反编译器是为了支持检查定义名称的更简单和/或常见用法。
Formulas in general are of several kinds: cell, shared, and array (all associated with a cell, directly or indirectly), name, data validation, and conditional formatting.
公式通常有几种类型:单元格、共享和数组(都与单元格直接或间接关联)、名称、数据验证和条件格式。
Decompiling general formulas from bytecode to text is a "work-in-progress", slowly. Note that supposing it were available, you would then need to parse the text formula to extract the cell references. Parsing Excel formulas correctly is not an easy job; as with HTML, using regexes looks easy but doesn't work. It would be better to extract the references directly from the formula bytecode.
将通用公式从字节码反编译为文本是一项“正在进行的工作”,很慢。请注意,假设它可用,则您需要解析文本公式以提取单元格引用。正确解析 Excel 公式并非易事;与 HTML 一样,使用正则表达式看起来很容易,但行不通。直接从公式字节码中提取引用会更好。
Also note that cell-based formulas can refer to names, and name formulas can refer both to cells and to other names. So it would be necessary to extract both cell and name references from both cell-based and name formulas. It may be useful to you to have info on shared formulas available; otherwise having parsed the following:
另请注意,基于单元格的公式可以引用名称,名称公式可以引用单元格和其他名称。因此,有必要从基于单元格的公式和名称公式中提取单元格和名称引用。获取有关可用共享公式的信息可能对您有用;否则解析以下内容:
B2 =A2
B3 =A3+B2
B4 =A4+B3
B5 =A5+B4
...
B60 =A60+B59
you would need to deduce the similarity between the B3:B60formulas yourself.
您需要自己推断B3:B60公式之间的相似性。
In any case, none of the above is likely to be available any time soon -- xlrdpriorities lie elsewhere.
在任何情况下,上述任何一项都不太可能很快可用——xlrd优先事项在别处。
回答by Steve
It seems that it is impossible now to do what you want with xlrd. You can have a look at this postfor the detailed description of why it is so difficult to implement the functionality you need.
似乎现在不可能用 xlrd 做你想做的事。您可以查看这篇文章,详细说明为什么实现您需要的功能如此困难。
Note that the developping team does a great job for support at the python-excel google group.
请注意,开发团队在 python-excel google group 的支持方面做得很好。
回答by dgorissen
Update: I have gone and implemented a little library to do exactly what you describe: extracting the cells & dependencies from an Excel spreadsheet and converting them to python code. Code is on github, patches welcome :)
更新:我已经实现了一个小库来完全按照您的描述执行:从 Excel 电子表格中提取单元格和依赖项并将它们转换为 python 代码。代码在 github 上,欢迎补丁:)
Just to add that you can always interact with excel using win32com(not very fast but it works). This does allow you to get the formula. A tutorial can be found here[cached copy]and details can be found in this chapter[cached copy].
只是补充一点,您始终可以使用win32com与 excel 进行交互(速度不是很快,但它有效)。这确实允许您获得公式。一种教程可以在这里找到[缓存副本]和详细信息可以找到在这一章当中[缓存副本]。
Essentially you just do:
基本上你只需要:
app.ActiveWorkbook.ActiveSheet.Cells(r,c).Formula
As for building a table of cell dependencies, a tricky thing is parsing the excel expressions. If I remember correctly the Trace code you mentioned does not always do this correctly. The best I have seen is the algorithm by E. W. Bachtal, of which a python implementation is available which works well.
至于建立一个单元格依赖表,一个棘手的事情是解析 excel 表达式。如果我没记错的话,您提到的跟踪代码并不总是正确执行此操作。我见过的最好的是EW Bachtal 的算法,其中有一个 Python 实现,效果很好。
回答by Erik
I know this post is a little late but there's one suggestion that hasn't been covered here. Cut all the entries from the worksheet and paste using paste special (OpenOffice). This will convert the formulas to numbers so there's no need for additional programming and this is a reasonable solution for small workbooks.
我知道这篇文章有点晚了,但这里没有提到一个建议。剪切工作表中的所有条目并使用特殊粘贴 (OpenOffice) 进行粘贴。这会将公式转换为数字,因此不需要额外的编程,这是小型工作簿的合理解决方案。
回答by mkultra
So I know this is a very old post, but I found a decent way of getting the formulas from all the sheets in a workbook as well as having the newly created workbook retain all the formatting.
所以我知道这是一篇很老的帖子,但我找到了一种从工作簿中的所有工作表中获取公式以及让新创建的工作簿保留所有格式的好方法。
First step is to save a copy of your .xlsx file as .xls -- Use the .xls as the filename in the code below
第一步是将 .xlsx 文件的副本另存为 .xls -- 在下面的代码中使用 .xls 作为文件名
Using Python 2.7
使用 Python 2.7
from lxml import etree
from StringIO import StringIO
import xlsxwriter
import subprocess
from xlrd import open_workbook
from xlutils.copy import copy
from xlsxwriter.utility import xl_cell_to_rowcol
import os
file_name = '<YOUR-FILE-HERE>'
dir_path = os.path.dirname(os.path.realpath(file_name))
subprocess.call(["unzip",str(file_name+"x"),"-d","file_xml"])
xml_sheet_names = dict()
with open_workbook(file_name,formatting_info=True) as rb:
wb = copy(rb)
workbook_names_list = rb.sheet_names()
for i,name in enumerate(workbook_names_list):
xml_sheet_names[name] = "sheet"+str(i+1)
sheet_formulas = dict()
for i, k in enumerate(workbook_names_list):
xmlFile = os.path.join(dir_path,"file_xml/xl/worksheets/{}.xml".format(xml_sheet_names[k]))
with open(xmlFile) as f:
xml = f.read()
tree = etree.parse(StringIO(xml))
context = etree.iterparse(StringIO(xml))
sheet_formulas[k] = dict()
for _, elem in context:
if elem.tag.split("}")[1]=='f':
cell_key = elem.getparent().get(key="r")
cell_formula = elem.text
sheet_formulas[k][cell_key] = str("="+cell_formula)
sheet_formulas
Structure of Dictionary 'sheet_formulas'
字典'sheet_formulas'的结构
{'Worksheet_Name': {'A1_cell_reference':'cell_formula'}}
Example results:
结果示例:
{u'CY16': {'A1': '=Data!B5',
'B1': '=Data!B1',
'B10': '=IFERROR(Data!B12,"")',
'B11': '=IFERROR(SUM(B9:B10),"")',
回答by Kairat Koibagarov
Ye! With win32com it's works for me.
耶!使用 win32com,它对我有用。
import win32com.client
Excel = win32com.client.Dispatch("Excel.Application")
# python -m pip install pywin32
file=r'path Excel file'
wb = Excel.Workbooks.Open(file)
sheet = wb.ActiveSheet
#Get value
val = sheet.Cells(1,1).value
# Get Formula
sheet.Cells(6,2).Formula

