如何使用 python 将 XLSB 文件转换为 csv?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22366200/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I convert a XLSB file to csv using python?
提问by GiannisIordanou
I have been provided with a xlsb file full of data. I want to process the data using python. I can convert it to csv using excel or open office, but I would like the whole process to be more automated. Any ideas?
我收到了一个充满数据的 xlsb 文件。我想使用python处理数据。我可以使用 excel 或 open office 将其转换为 csv,但我希望整个过程更加自动化。有任何想法吗?
Update:I took a look at this questionand used the first answer:
更新:我看了一下这个问题并使用了第一个答案:
import subprocess
subprocess.call("cscript XlsToCsv.vbs data.xlsb data.csv", shell=False)
The issue is the file contains greek letters so the encoding is not preserved. Opening the csv with Notepad++ it looks as it should, but when I try to insert into a database comes like this ???. Opening the file as csv, just to read text is displayed like this: \xc2\xc5\xcb instead of ΒΕΛ.
问题是该文件包含希腊字母,因此不保留编码。用 Notepad++ 打开 csv 看起来应该是这样,但是当我尝试插入数据库时,它是这样的???。以 csv 格式打开文件,只是为了读取文本显示如下:\xc2\xc5\xcb 而不是 ΒΕΛ。
I realize it's an issue in encoding, but it's possible to retain the original encoding converting the xlsb file to csv ?
我意识到这是编码问题,但是可以保留将 xlsb 文件转换为 csv 的原始编码吗?
采纳答案by luk32
Most popular Excel python packages openpyxland xlrdhave no support for xlsbformat (bug tracker entries: openpyxl, xlrd).
大多数流行的 Excel python 包openpyxl和xlrd不支持xlsb格式(错误跟踪器条目:openpyxl,xlrd)。
So I'm afraid there is no native python way =/. However, since you are using windows, it should be easy to script the task with external tools.
所以恐怕没有原生的python方式=/。但是,由于您使用的是 Windows,因此使用外部工具编写任务脚本应该很容易。
I would suggest taking look at Convert XLS to XLSB Programatically?. You mention python in title but the matter of the question does not imply you are strongly coupled to it, so you could go pure c# way.
我建议以编程方式查看将 XLS 转换为 XLSB?. 您在标题中提到了 python,但问题并不意味着您与它强耦合,因此您可以采用纯 c# 方式。
If you feel really comfortable only with python one of the answers there suggests a command line tool under a fancy name of Convert-XLSB. You could script it as an external tool from python with subprocess.
如果您只对 python 感到很舒服,那么其中一个答案建议使用一个名为Convert-XLSB的命令行工具。您可以使用subprocess.
I know this is not a good answer, but I don't think there is better/easier way as of now.
我知道这不是一个好的答案,但我认为目前没有更好/更简单的方法。
回答by WeaselFox
I think you can do this using pyuno. This blog entryshows how to convert xls files to csv, and as open office supports xlsb files since version 3.2, this code might just work for you. You will have to go through hassle of setting up the pyuno environment though..
我认为你可以使用pyuno来做到这一点。此博客条目显示了如何将 xls 文件转换为 csv,并且由于 open office 自版本3.2起支持 xlsb 文件,因此此代码可能适合您。不过,您将不得不经历设置 pyuno 环境的麻烦..
回答by joseldn
XLSB is a binary format and I don't think you'll be able to parse it with current python tools and packages. If you still want to somehow automate the process with python you can do what the others have told you and script that windows CLI tool. Calling the .exe from the command line with subprocess, and passing an array of the files you want to convert.
XLSB 是一种二进制格式,我认为您无法使用当前的 python 工具和包来解析它。如果您仍然想以某种方式使用 python 自动化该过程,您可以执行其他人告诉您的操作并编写该 windows CLI 工具的脚本。使用子进程从命令行调用 .exe,并传递要转换的文件数组。
I.e: with a script similar to this one you could convert all the .xlsb files that you place in the "xlsb" folder to .csv format...
即:使用类似于此脚本的脚本,您可以将放置在“xlsb”文件夹中的所有 .xlsb 文件转换为 .csv 格式...
├── xlsb
│?? ├── file1.xlsb
│?? ├── file2.xlsb
│?? └── file3.xlsb
└── xlsb_to_csv.py
xlsb_to_csv.py
xlsb_to_csv.py
#!/usr/bin/env python
import os
files = [f for f in os.listdir('./xlsb')]
for f in files:
subprocess.call("ConvertXLS.EXE " + str(f) + " --arguments", shell=True)
Note: the Windows command is pseudocode... I use a similar approach to batch-convert stuff in headless windows servers for testing purpouses. You just have to figure out the exe location and the windows command...
注意:Windows 命令是伪代码......我使用类似的方法在无头 Windows 服务器中批量转换内容以测试目的。你只需要弄清楚exe位置和windows命令......
Hope it helps... good luck!
希望它有帮助...祝你好运!
回答by Mixopteryx
The script you reference seem to use the ActiveX interface to Excel, and save via its Workbook.SaveAsmethod.
According to the MSDN documentationthis method have a TextCodepageargument which may be helpful.
您引用的脚本似乎使用了 Excel 的 ActiveX 接口,并通过其Workbook.SaveAs方法保存。根据MSDN 文档,此方法有一个TextCodepage可能有用的论据。
Sidenote: You can rewrite the VB script in python, see this question.
旁注:您可以在 python 中重写 VB 脚本,请参阅此问题。
回答by Agung Prasetyo
In my previous experience, i was handling converting xlsb using libreofficecommand line utility,
在我以前的经验中,我正在使用libreoffice命令行实用程序处理转换 xlsb ,
In ruby i just execute system command to call libreoffice for converting xlsb format to csv:
在 ruby 中,我只是执行系统命令来调用 libreoffice 将 xlsb 格式转换为 csv:
`libreoffice --headless --convert-to csv your_csv_file.xlsb --outdir /path/csv`
and to change the encoding i use command line to using iconv, using ruby :
并更改编码我使用命令行使用iconv,使用 ruby :
`iconv -f ISO-8859-1 -t UTF-8 your_csv_file.csv > new_file_csv.csv`
回答by XsjakieX
I also looked at the problem and the following worked for me. First opening the file in excel via python and than saving it to different file. Bit of a workaround but I like it more than other solutions. In example I use file format 6 which is CSV but you can also use other ones.
我还查看了问题,以下内容对我有用。首先通过python在excel中打开文件,然后将其保存到不同的文件。有点解决方法,但我比其他解决方案更喜欢它。在示例中,我使用 CSV 文件格式 6,但您也可以使用其他格式。
import win32com.client
excel = win32com.client.Dispatch("Excel.Application")
excel.DisplayAlerts = False
excel.Visible=False
doc = excel.Workbooks.Open("C:/users/A295998/Python/@TA1PROG3.xlsb")
doc.SaveAs(Filename="C:\users\A295998\Python\test5.csv",FileFormat=6)
doc.Close()
excel.Quit()
回答by Sergio Lucero
I've encountered this same problem and using pyxlsbdoes it for me:
我遇到了同样的问题,使用pyxlsb为我做了:
from pyxlsb import open_workbook
with open_workbook('HugeDataFile.xlsb') as wb:
for sheetname in wb.sheets:
with wb.get_sheet(sheetname) as sheet:
for row in sheet.rows():
values = [r.v for r in row] # retrieving content
csv_line = ','.join(values) # or do your thing

