在 python 中使用 csv 模块读取 .xlsx

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35744613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:54:07  来源:igfitidea点击:

Read in .xlsx with csv module in python

pythonexcelencodingutf-8

提问by pHorseSpec

I'm trying to read in an excel file with .xlsx formatting with the csvmodule, but I'm not having any luck with it when using an excel file even with my dialect and encoding specified. Below, I show my different attempts and error results with the different encodings I tried. If anyone could point me into the correct coding, syntax or module I could use to read in a .xlsx file in Python, I'd appreciate it.

我正在尝试使用csv模块读取具有 .xlsx 格式的 excel 文件,但是即使指定了我的方言和编码,在使用 excel 文件时我也没有任何运气。下面,我用我尝试过的不同编码展示了我的不同尝试和错误结果。如果有人能指出我可以用来在 Python 中读取 .xlsx 文件的正确编码、语法或模块,我将不胜感激。

With the below code, I get the following error: _csv.Error: line contains NULL byte

使用以下代码,我收到以下错误: _csv.Error: line contains NULL byte

#!/usr/bin/python

import sys, csv

with open('filelocation.xlsx', "r+", encoding="Latin1")  as inputFile:
    csvReader = csv.reader(inputFile, dialect='excel')
    for row in csvReader:
        print(row)

With the below code, I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte

使用以下代码,我收到以下错误: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte

#!/usr/bin/python

import sys, csv

with open('filelocation.xlsx', "r+", encoding="Latin1")  as inputFile:
    csvReader = csv.reader(inputFile, dialect='excel')
    for row in csvReader:
        print(row)

When I use utf-16in the encoding, I get the following error: UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate

当我utf-16在 中使用时encoding,出现以下错误:UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate

回答by Martin Evans

You cannot use Python's csvlibrary for reading xlsxformatted files. You need to install and use a different library. For example, you could use xlrdas follows:

您不能使用 Python 的csv库来读取xlsx格式化文件。您需要安装和使用不同的库。例如,您可以使用xlrd如下:

import xlrd

workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)

for rowx in range(sheet.nrows):
    values = sheet.row_values(rowx)
    print(values)

This would display all of the rows in the file as lists of row values. The Python Excelwebsite gives other possible examples.

这会将文件中的所有行显示为行值列表。在Python的Excel的网站提供了其它可能的实例。



Alternatively you could create a list of rows:

或者,您可以创建一个行列表:

import xlrd

workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)
data = [sheet.row_values(rowx) for rowx in range(sheet.nrows)]

print(data)

回答by Collin Anderson

Here's a very very rough implementation using just the standard library.

这是一个仅使用标准库的非常粗略的实现。

def xlsx(fname, sheet=1):
    import zipfile
    from xml.etree.ElementTree import iterparse
    z = zipfile.ZipFile(fname)
    strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
    rows = []
    row = {}
    value = ''
    for e, el in iterparse(z.open('xl/worksheets/sheet%s.xml' % sheet)):
        if el.tag.endswith('}v'):  # <v>84</v>
            value = el.text
        if el.tag.endswith('}c'):  # <c r="A3" t="s"><v>84</v></c>
            if el.attrib.get('t') == 's':
                value = strings[int(value)]
            column_name = ''.join(x for x in el.attrib['r'] if not x.isdigit())  # AZ22
            row[column_name] = value
            value = ''
        if el.tag.endswith('}row'):
            rows.append(row)
            row = {}
    return rows

(This is copied from a deleted question: https://stackoverflow.com/questions/4371163/reading-xlsx-files-using-python)

(这是从已删除的问题中复制的:https: //stackoverflow.com/questions/4371163/reading-xlsx-files-using-python

回答by Collin Anderson

Here's a very very rough implementation using just the standard library.

这是一个仅使用标准库的非常粗略的实现。

def xlsx(fname):
    import zipfile
    from xml.etree.ElementTree import iterparse
    z = zipfile.ZipFile(fname)
    strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
    rows = []
    row = {}
    value = ''
    for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')):
        if el.tag.endswith('}v'):  # <v>84</v>
            value = el.text
        if el.tag.endswith('}c'):  # <c r="A3" t="s"><v>84</v></c>
            if el.attrib.get('t') == 's':
                value = strings[int(value)]
            letter = el.attrib['r'] # AZ22
            while letter[-1].isdigit():
                letter = letter[:-1]
            row[letter] = value
            value = ''
        if el.tag.endswith('}row'):
            rows.append(row)
            row = {}
    return rows

This answer is copied from a deleted question: https://stackoverflow.com/a/22067980/131881

此答案复制自已删除的问题:https: //stackoverflow.com/a/22067980/131881