在 python 中使用 csv 模块读取 .xlsx

Question

提问by pHorseSpec

I'm trying to read in an excel file with .xlsx formatting with the csvmodule, but I'm not having any luck with it when using an excel file even with my dialect and encoding specified. Below, I show my different attempts and error results with the different encodings I tried. If anyone could point me into the correct coding, syntax or module I could use to read in a .xlsx file in Python, I'd appreciate it.

我正在尝试使用csv模块读取具有 .xlsx 格式的 excel 文件，但是即使指定了我的方言和编码，在使用 excel 文件时我也没有任何运气。下面，我用我尝试过的不同编码展示了我的不同尝试和错误结果。如果有人能指出我可以用来在 Python 中读取 .xlsx 文件的正确编码、语法或模块，我将不胜感激。

With the below code, I get the following error: _csv.Error: line contains NULL byte

使用以下代码，我收到以下错误： _csv.Error: line contains NULL byte

#!/usr/bin/python

import sys, csv

with open('filelocation.xlsx', "r+", encoding="Latin1")  as inputFile:
    csvReader = csv.reader(inputFile, dialect='excel')
    for row in csvReader:
        print(row)

With the below code, I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte

使用以下代码，我收到以下错误： UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte

#!/usr/bin/python

import sys, csv

with open('filelocation.xlsx', "r+", encoding="Latin1")  as inputFile:
    csvReader = csv.reader(inputFile, dialect='excel')
    for row in csvReader:
        print(row)

When I use utf-16in the encoding, I get the following error: UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate

当我utf-16在中使用时encoding，出现以下错误：UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate

Answer 1

回答by Martin Evans

You cannot use Python's csvlibrary for reading xlsxformatted files. You need to install and use a different library. For example, you could use xlrdas follows:

您不能使用 Python 的csv库来读取xlsx格式化文件。您需要安装和使用不同的库。例如，您可以使用xlrd如下：

import xlrd

workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)

for rowx in range(sheet.nrows):
    values = sheet.row_values(rowx)
    print(values)

This would display all of the rows in the file as lists of row values. The Python Excelwebsite gives other possible examples.

这会将文件中的所有行显示为行值列表。在Python的Excel的网站提供了其它可能的实例。

Alternatively you could create a list of rows:

或者，您可以创建一个行列表：

import xlrd

workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)
data = [sheet.row_values(rowx) for rowx in range(sheet.nrows)]

print(data)

Answer 2

回答by Collin Anderson

Here's a very very rough implementation using just the standard library.

这是一个仅使用标准库的非常粗略的实现。

def xlsx(fname, sheet=1):
    import zipfile
    from xml.etree.ElementTree import iterparse
    z = zipfile.ZipFile(fname)
    strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
    rows = []
    row = {}
    value = ''
    for e, el in iterparse(z.open('xl/worksheets/sheet%s.xml' % sheet)):
        if el.tag.endswith('}v'):  # <v>84</v>
            value = el.text
        if el.tag.endswith('}c'):  # <c r="A3" t="s"><v>84</v></c>
            if el.attrib.get('t') == 's':
                value = strings[int(value)]
            column_name = ''.join(x for x in el.attrib['r'] if not x.isdigit())  # AZ22
            row[column_name] = value
            value = ''
        if el.tag.endswith('}row'):
            rows.append(row)
            row = {}
    return rows

(This is copied from a deleted question: https://stackoverflow.com/questions/4371163/reading-xlsx-files-using-python)

（这是从已删除的问题中复制的：https: //stackoverflow.com/questions/4371163/reading-xlsx-files-using-python）

Answer 3

回答by Collin Anderson

Here's a very very rough implementation using just the standard library.

这是一个仅使用标准库的非常粗略的实现。

def xlsx(fname):
    import zipfile
    from xml.etree.ElementTree import iterparse
    z = zipfile.ZipFile(fname)
    strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
    rows = []
    row = {}
    value = ''
    for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')):
        if el.tag.endswith('}v'):  # <v>84</v>
            value = el.text
        if el.tag.endswith('}c'):  # <c r="A3" t="s"><v>84</v></c>
            if el.attrib.get('t') == 's':
                value = strings[int(value)]
            letter = el.attrib['r'] # AZ22
            while letter[-1].isdigit():
                letter = letter[:-1]
            row[letter] = value
            value = ''
        if el.tag.endswith('}row'):
            rows.append(row)
            row = {}
    return rows

This answer is copied from a deleted question: https://stackoverflow.com/a/22067980/131881

此答案复制自已删除的问题：https: //stackoverflow.com/a/22067980/131881

在 python 中使用 csv 模块读取 .xlsx

提问by pHorseSpec

回答by Martin Evans

回答by Collin Anderson

回答by Collin Anderson

相关推荐

最近更新

标签

在 python 中使用 csv 模块读取 .xlsx

提问by pHorseSpec

回答by Martin Evans

回答by Collin Anderson

回答by Collin Anderson

相关推荐

Python cx_Oracle 客户端库无法加载

Python 打印文件名

Python 可以在查询后过滤查询集吗？姜戈

Python 在 virtualenv 中使用 pip 安装 pyaudio

相关推荐

最近更新

标签