pandas 大熊猫按格式读取excel
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38038428/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas read excel as formatted
提问by carlo
how do i get the values of a spreadsheet as they are formatted? im working on spreadsheets with a currency format
我如何在格式化后获取电子表格的值?我正在处理具有货币格式的电子表格
this for example:
这例如:
ITEM NAME UNIT PRICE
item1 USD 99
item2 SGD 45
but the terms 'USD' and 'SGD' were added using the formatting capabilities of excel, and is not seen by the read_excel function of pandas. i would get the values, but not the currency name. i could only work on the spreadsheets as it is, and given that i have various spreadsheets with about 6-7 sheets each, i was hoping to have a pandas (or python)-level solution rather than an excel-level solution.
但是术语“USD”和“SGD”是使用excel的格式化功能添加的,pandas的read_excel函数看不到这些术语。我会得到值,但不会得到货币名称。我只能按原样处理电子表格,并且考虑到我有各种电子表格,每个电子表格大约有 6-7 张,我希望有一个 Pandas(或 Python)级别的解决方案,而不是一个 excel 级别的解决方案。
thanks guys.
谢谢你们。
to Daniel, this is how i implemented the 'xlrd' engine, which didn't seem to do anything.
对丹尼尔来说,这就是我实现“xlrd”引擎的方式,它似乎没有做任何事情。
excel = pd.ExcelFile('itemlist.xlsx', sheetname=None)
master = pd.DataFrame(None)
for sheet in excel.sheet_names:
df = pd.read_excel(excel,sheet,header=2, engine='xlrd')
master=master.append(df)
回答by chrisb
There's not any great way to do this. pandas
has no knowledge of the number formats, and xlrd
doesn't seem to be able to read formats from a .xlsx file - see here
没有什么好方法可以做到这一点。 pandas
不了解数字格式,并且xlrd
似乎无法从 .xlsx 文件中读取格式 - 请参见此处
You could use openpyxl
to accomplish this, it at least has access to the number formats, but it looks like you'd have to basically implement all the parsing logic yourself.
您可以使用它openpyxl
来完成此操作,它至少可以访问数字格式,但看起来您基本上必须自己实现所有解析逻辑。
In [26]: from openpyxl import load_workbook
In [27]: wb = load_workbook('temp.xlsx')
In [28]: ws = wb.worksheets[0]
In [29]: ws.cell("B2") # numeric value = 4, formatted as "USD 4"
Out[29]: <Cell Sheet1.B2>
In [30]: ws.cell("B2").value
Out[30]: 4
In [31]: ws.cell("B2").number_format
Out[31]: '"USD "#'