在 python 中使用 csv 模块读取 .xlsx
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35744613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read in .xlsx with csv module in python
提问by pHorseSpec
I'm trying to read in an excel file with .xlsx formatting with the csv
module, but I'm not having any luck with it when using an excel file even with my dialect and encoding specified. Below, I show my different attempts and error results with the different encodings I tried. If anyone could point me into the correct coding, syntax or module I could use to read in a .xlsx file in Python, I'd appreciate it.
我正在尝试使用csv
模块读取具有 .xlsx 格式的 excel 文件,但是即使指定了我的方言和编码,在使用 excel 文件时我也没有任何运气。下面,我用我尝试过的不同编码展示了我的不同尝试和错误结果。如果有人能指出我可以用来在 Python 中读取 .xlsx 文件的正确编码、语法或模块,我将不胜感激。
With the below code, I get the following error: _csv.Error: line contains NULL byte
使用以下代码,我收到以下错误: _csv.Error: line contains NULL byte
#!/usr/bin/python
import sys, csv
with open('filelocation.xlsx', "r+", encoding="Latin1") as inputFile:
csvReader = csv.reader(inputFile, dialect='excel')
for row in csvReader:
print(row)
With the below code, I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte
使用以下代码,我收到以下错误: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte
#!/usr/bin/python
import sys, csv
with open('filelocation.xlsx', "r+", encoding="Latin1") as inputFile:
csvReader = csv.reader(inputFile, dialect='excel')
for row in csvReader:
print(row)
When I use utf-16
in the encoding
, I get the following error: UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate
当我utf-16
在 中使用时encoding
,出现以下错误:UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate
回答by Martin Evans
You cannot use Python's csv
library for reading xlsx
formatted files. You need to install and use a different library. For example, you could use xlrd
as follows:
您不能使用 Python 的csv
库来读取xlsx
格式化文件。您需要安装和使用不同的库。例如,您可以使用xlrd
如下:
import xlrd
workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)
for rowx in range(sheet.nrows):
values = sheet.row_values(rowx)
print(values)
This would display all of the rows in the file as lists of row values. The Python Excelwebsite gives other possible examples.
这会将文件中的所有行显示为行值列表。在Python的Excel的网站提供了其它可能的实例。
Alternatively you could create a list of rows:
或者,您可以创建一个行列表:
import xlrd
workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)
data = [sheet.row_values(rowx) for rowx in range(sheet.nrows)]
print(data)
回答by Collin Anderson
Here's a very very rough implementation using just the standard library.
这是一个仅使用标准库的非常粗略的实现。
def xlsx(fname, sheet=1):
import zipfile
from xml.etree.ElementTree import iterparse
z = zipfile.ZipFile(fname)
strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
rows = []
row = {}
value = ''
for e, el in iterparse(z.open('xl/worksheets/sheet%s.xml' % sheet)):
if el.tag.endswith('}v'): # <v>84</v>
value = el.text
if el.tag.endswith('}c'): # <c r="A3" t="s"><v>84</v></c>
if el.attrib.get('t') == 's':
value = strings[int(value)]
column_name = ''.join(x for x in el.attrib['r'] if not x.isdigit()) # AZ22
row[column_name] = value
value = ''
if el.tag.endswith('}row'):
rows.append(row)
row = {}
return rows
(This is copied from a deleted question: https://stackoverflow.com/questions/4371163/reading-xlsx-files-using-python)
(这是从已删除的问题中复制的:https: //stackoverflow.com/questions/4371163/reading-xlsx-files-using-python)
回答by Collin Anderson
Here's a very very rough implementation using just the standard library.
这是一个仅使用标准库的非常粗略的实现。
def xlsx(fname):
import zipfile
from xml.etree.ElementTree import iterparse
z = zipfile.ZipFile(fname)
strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
rows = []
row = {}
value = ''
for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')):
if el.tag.endswith('}v'): # <v>84</v>
value = el.text
if el.tag.endswith('}c'): # <c r="A3" t="s"><v>84</v></c>
if el.attrib.get('t') == 's':
value = strings[int(value)]
letter = el.attrib['r'] # AZ22
while letter[-1].isdigit():
letter = letter[:-1]
row[letter] = value
value = ''
if el.tag.endswith('}row'):
rows.append(row)
row = {}
return rows
This answer is copied from a deleted question: https://stackoverflow.com/a/22067980/131881
此答案复制自已删除的问题:https: //stackoverflow.com/a/22067980/131881