pandas 大熊猫按格式读取excel

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38038428/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:27:50  来源:igfitidea点击:

pandas read excel as formatted

pythonexcelpandas

提问by carlo

how do i get the values of a spreadsheet as they are formatted? im working on spreadsheets with a currency format

我如何在格式化后获取电子表格的值?我正在处理具有货币格式的电子表格

this for example:

这例如:

ITEM NAME UNIT PRICE
item1     USD 99
item2     SGD 45

but the terms 'USD' and 'SGD' were added using the formatting capabilities of excel, and is not seen by the read_excel function of pandas. i would get the values, but not the currency name. i could only work on the spreadsheets as it is, and given that i have various spreadsheets with about 6-7 sheets each, i was hoping to have a pandas (or python)-level solution rather than an excel-level solution.

但是术语“USD”和“SGD”是使用excel的格式化功能添加的,pandas的read_excel函数看不到这些术语。我会得到值,但不会得到货币名称。我只能按原样处理电子表格,并且考虑到我有各种电子表格,每个电子表格大约有 6-7 张,我希望有一个 Pandas(或 Python)级别的解决方案,而不是一个 excel 级别的解决方案。

thanks guys.

谢谢你们。

to Daniel, this is how i implemented the 'xlrd' engine, which didn't seem to do anything.

对丹尼尔来说,这就是我实现“xlrd”引擎的方式,它似乎没有做任何事情。

excel = pd.ExcelFile('itemlist.xlsx', sheetname=None)
master = pd.DataFrame(None)

for sheet in excel.sheet_names:
    df = pd.read_excel(excel,sheet,header=2, engine='xlrd')
    master=master.append(df)

回答by chrisb

There's not any great way to do this. pandashas no knowledge of the number formats, and xlrddoesn't seem to be able to read formats from a .xlsx file - see here

没有什么好方法可以做到这一点。 pandas不了解数字格式,并且xlrd似乎无法从 .xlsx 文件中读取格式 - 请参见此处

You could use openpyxlto accomplish this, it at least has access to the number formats, but it looks like you'd have to basically implement all the parsing logic yourself.

您可以使用它openpyxl来完成此操作,它至少可以访问数字格式,但看起来您基本上必须自己实现所有解析逻辑。

In [26]: from openpyxl import load_workbook

In [27]: wb = load_workbook('temp.xlsx')

In [28]: ws = wb.worksheets[0]

In [29]: ws.cell("B2")  # numeric value = 4, formatted as "USD 4"
Out[29]: <Cell Sheet1.B2>

In [30]: ws.cell("B2").value
Out[30]: 4

In [31]: ws.cell("B2").number_format
Out[31]: '"USD "#'

回答by carlo

to anyone having the same problem, i did it by separating each sheet into CSVs, using this answer

对于任何有同样问题的人,我通过将每个工作表分成 CSV 来做到这一点,使用这个答案

It preserves the currency details as displayed in the spreadsheet, and while a little cumbersome, it did the job anyway.

它保留了电子表格中显示的货币详细信息,虽然有点麻烦,但它还是完成了这项工作。