用 Python 读取 Excel 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22169325/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:24:53  来源:igfitidea点击:

Read Excel File in Python

pythonexcelxlrd

提问by PythonEnthusiast

I've an Excel File

我有一个 Excel 文件

Arm_id      DSPName        DSPCode          HubCode          PinCode    PPTL
1            JaVAS            01              AGR             282001    1,2
2            JaVAS            01              AGR             282002    3,4
3            JaVAS            01              AGR             282003    5,6

I want to save a string in the form Arm_id,DSPCode,Pincode. This format is configurable, i.e. it might change to DSPCode,Arm_id,Pincode. I save it in a list like:

我想在表单中保存一个字符串Arm_id,DSPCode,Pincode。这种格式是可配置的,即它可能会更改为DSPCode,Arm_id,Pincode. 我将它保存在一个列表中,如:

FORMAT = ['Arm_id', 'DSPName', 'Pincode']

How do I read the content of a specific column with provided name, given that the FORMATis configurable?

鉴于FORMAT可配置,如何读取具有提供名称的特定列的内容?

This is what I tried. Currently I'm able to read all the content in the file

这是我尝试过的。目前我能够阅读文件中的所有内容

from xlrd import open_workbook
wb = open_workbook('sample.xls')
for s in wb.sheets():
    #print 'Sheet:',s.name
    values = []
    for row in range(s.nrows):
        col_value = []
        for col in range(s.ncols):
            value  = (s.cell(row,col).value)
            try : value = str(int(value))
            except : pass
            col_value.append(value)
        values.append(col_value)
print values

My output is

我的输出是

[[u'Arm_id', u'DSPName', u'DSPCode', u'HubCode', u'PinCode', u'PPTL'], ['1', u'JaVAS', '1', u'AGR', '282001', u'1,2'], ['2', u'JaVAS', '1', u'AGR', '282002', u'3,4'], ['3', u'JaVAS', '1', u'AGR', '282003', u'5,6']]

Then I loop around values[0]trying to find out the FORMATcontent in values[0]and then getting the index of Arm_id, DSPname and Pincodein the values[0]and then from next loop I know the index of all the FORMATfactors , thereby getting to know which value do I need to get .

围绕然后我环路values[0]试图找出FORMAT在内容上values[0],然后让指数Arm_id, DSPname and Pincodevalues[0],然后从下一个循环,我知道所有的指数FORMAT的因素,从而让知道哪些价值,我需要得到的。

But this is such a poor solution.

但这是一个非常糟糕的解决方案。

How do I get the values of a specific column with name in excel file?

如何在excel文件中获取具有名称的特定列的值?

采纳答案by tamasgal

This is one approach:

这是一种方法:

from xlrd import open_workbook

class Arm(object):
    def __init__(self, id, dsp_name, dsp_code, hub_code, pin_code, pptl):
        self.id = id
        self.dsp_name = dsp_name
        self.dsp_code = dsp_code
        self.hub_code = hub_code
        self.pin_code = pin_code
        self.pptl = pptl

    def __str__(self):
        return("Arm object:\n"
               "  Arm_id = {0}\n"
               "  DSPName = {1}\n"
               "  DSPCode = {2}\n"
               "  HubCode = {3}\n"
               "  PinCode = {4} \n"
               "  PPTL = {5}"
               .format(self.id, self.dsp_name, self.dsp_code,
                       self.hub_code, self.pin_code, self.pptl))

wb = open_workbook('sample.xls')
for sheet in wb.sheets():
    number_of_rows = sheet.nrows
    number_of_columns = sheet.ncols

    items = []

    rows = []
    for row in range(1, number_of_rows):
        values = []
        for col in range(number_of_columns):
            value  = (sheet.cell(row,col).value)
            try:
                value = str(int(value))
            except ValueError:
                pass
            finally:
                values.append(value)
        item = Arm(*values)
        items.append(item)

for item in items:
    print item
    print("Accessing one single value (eg. DSPName): {0}".format(item.dsp_name))
    print

You don't have to use a custom class, you can simply take a dict(). If you use a class however, you can access all values via dot-notation, as you see above.

您不必使用自定义类,您只需使用dict(). 但是,如果您使用类,则可以通过点符号访问所有值,如上所示。

Here is the output of the script above:

这是上面脚本的输出:

Arm object:
  Arm_id = 1
  DSPName = JaVAS
  DSPCode = 1
  HubCode = AGR
  PinCode = 282001 
  PPTL = 1
Accessing one single value (eg. DSPName): JaVAS

Arm object:
  Arm_id = 2
  DSPName = JaVAS
  DSPCode = 1
  HubCode = AGR
  PinCode = 282002 
  PPTL = 3
Accessing one single value (eg. DSPName): JaVAS

Arm object:
  Arm_id = 3
  DSPName = JaVAS
  DSPCode = 1
  HubCode = AGR
  PinCode = 282003 
  PPTL = 5
Accessing one single value (eg. DSPName): JaVAS

回答by Noel Evans

So the key parts are to grab the header ( col_names = s.row(0)) and when iterating through the rows, to skip the first row which isn't needed for row in range(1, s.nrows)- done by using range from 1 onwards (not the implicit 0). You then use zip to step through the rows holding 'name' as the header of the column.

因此,关键部分是获取标题 ( col_names = s.row(0)) 并在遍历行时跳过不需要的第一行for row in range(1, s.nrows)- 通过使用从 1 开始的范围(而不是隐式 0)来完成。然后,您可以使用 zip 单步执行包含“名称”作为列标题的行。

from xlrd import open_workbook

wb = open_workbook('Book2.xls')
values = []
for s in wb.sheets():
    #print 'Sheet:',s.name
    for row in range(1, s.nrows):
        col_names = s.row(0)
        col_value = []
        for name, col in zip(col_names, range(s.ncols)):
            value  = (s.cell(row,col).value)
            try : value = str(int(value))
            except : pass
            col_value.append((name.value, value))
        values.append(col_value)
print values

回答by poida

The approach I took reads the header information from the first row to determine the indexes of the columns of interest.

我采用的方法是从第一行读取标题信息以确定感兴趣的列的索引。

You mentioned in the question that you also want the values output to a string. I dynamically build a format string for the output from the FORMAT column list. Rows are appended to the values string separated by a new line char.

您在问题中提到您还希望将值输出到字符串。我为来自 FORMAT 列列表的输出动态构建格式字符串。行被附加到由换行符分隔的值字符串。

The output column order is determined by the order of the column names in the FORMAT list.

输出列顺序由 FORMAT 列表中列名的顺序决定。

In my code below the case of the column name in the FORMAT list is important. In the question above you've got 'Pincode' in your FORMAT list, but 'PinCode' in your excel. This wouldn't work below, it would need to be 'PinCode'.

在我下面的代码中,FORMAT 列表中列名的大小写很重要。在上面的问题中,您的 FORMAT 列表中有“Pincode”,但 Excel 中有“PinCode”。这在下面不起作用,它需要是“PinCode”。

from xlrd import open_workbook
wb = open_workbook('sample.xls')

FORMAT = ['Arm_id', 'DSPName', 'PinCode']
values = ""

for s in wb.sheets():
    headerRow = s.row(0)
    columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == firstRow[x].value]
    formatString = ("%s,"*len(columnIndex))[0:-1] + "\n"

    for row in range(1,s.nrows):
        currentRow = s.row(row)
        currentRowValues = [currentRow[x].value for x in columnIndex]
        values += formatString % tuple(currentRowValues)

print values

For the sample input you gave above this code outputs:

对于您在此代码输出上面给出的示例输入:

>>> 1.0,JaVAS,282001.0
2.0,JaVAS,282002.0
3.0,JaVAS,282003.0

And because I'm a python noob, props be to: this answer, this answer, this question, this questionand this answer.

因为我是一个 python noob,道具是: 这个答案这个答案这个问题这个问题和这个答案

回答by sheinis

A somewhat late answer, but with pandas it is possible to get directly a column of an excel file:

一个有点晚的答案,但使用熊猫可以直接获得一列excel文件:

import pandas
import xlrd
df = pandas.read_excel('sample.xls')
#print the column names
print df.columns
#get the values for a given column
values = df['Arm_id'].values
#get a data frame with selected columns
FORMAT = ['Arm_id', 'DSPName', 'Pincode']
df_selected = df[FORMAT]

回答by Mahabubuzzaman

By using pandas we can read excel easily.

通过使用pandas,我们可以轻松阅读excel。

import pandas as pd 
import xlrd as xl 
from pandas import ExcelWriter
from pandas import ExcelFile 

DataF=pd.read_excel("Test.xlsx",sheet_name='Sheet1')

print("Column headings:")
print(DataF.columns)

Test at :https://repl.itReference: https://pythonspot.com/read-excel-with-pandas/

测试:https: //repl.it参考:https: //pythonspot.com/read-excel-with-pandas/

回答by TSeymour

Although I almost always just use pandas for this, my current little tool is being packaged into an executable and including pandas is overkill. So I created a version of poida's solution that resulted in a list of named tuples. His code with this change would look like this:

虽然我几乎总是只使用 Pandas,但我当前的小工具被打包成一个可执行文件,并且包含 Pandas 是多余的。所以我创建了一个poida的解决方案版本,它产生了一个命名元组列表。他进行此更改的代码如下所示:

from xlrd import open_workbook
from collections import namedtuple
from pprint import pprint

wb = open_workbook('sample.xls')

FORMAT = ['Arm_id', 'DSPName', 'PinCode']
OneRow = namedtuple('OneRow', ' '.join(FORMAT))
all_rows = []

for s in wb.sheets():
    headerRow = s.row(0)
    columnIndex = [x for y in FORMAT for x in range(len(headerRow)) if y == headerRow[x].value]

    for row in range(1,s.nrows):
        currentRow = s.row(row)
        currentRowValues = [currentRow[x].value for x in columnIndex]
        all_rows.append(OneRow(*currentRowValues))

pprint(all_rows)

回答by harsha vardhan

Here is the code to read an excel file and and print all the cells present in column 1 (except the first cell i.e the header):

这是读取excel文件并打印第1列中存在的所有单元格(第一个单元格即标题除外)的代码:

import xlrd

file_location="C:\pythonprog\xxx.xlsv"
workbook=xlrd.open_workbook(file_location)
sheet=workbook.sheet_by_index(0)
print(sheet.cell_value(0,0))

for row in range(1,sheet.nrows):
     print(sheet.cell_value(row,0))