XLRD/Python：使用 for 循环将 Excel 文件读入 dict

Question

提问by kylerthecreator

I'm looking to read in an Excel workbook with 15 fields and about 2000 rows, and convert each row to a dictionary in Python. I then want to append each dictionary to a list. I'd like each field in the top row of the workbook to be a key within each dictionary, and have the corresponding cell value be the value within the dictionary. I've already looked at examples hereand here, but I'd like to do something a bit different. The second example will work, but I feel like it would be more efficient looping over the top row to populate the dictionary keys and then iterate through each row to get the values. My Excel file contains data from discussion forums and looks something like this (obviously with more columns):

我希望在包含 15 个字段和大约 2000 行的 Excel 工作簿中阅读，并将每一行转换为 Python 中的字典。然后我想将每个字典附加到一个列表中。我希望工作簿顶行中的每个字段都是每个字典中的一个键，并将相应的单元格值作为字典中的值。我已经看过这里和这里的例子，但我想做一些不同的事情。第二个示例将起作用，但我觉得循环顶行以填充字典键然后遍历每一行以获取值会更有效。我的 Excel 文件包含来自论坛的数据，看起来像这样（显然有更多的列）：

id    thread_id    forum_id    post_time    votes    post_text
4     100          3           1377000566   1        'here is some text'
5     100          4           1289003444   0        'even more text here'

So, I'd like the fields id, thread_idand so on, to be the dictionary keys. I'd like my dictionaries to look like:

所以，我想等领域id，thread_id等等，是字典键。我希望我的字典看起来像：

{id: 4, 
thread_id: 100,
forum_id: 3,
post_time: 1377000566,
votes: 1,
post_text: 'here is some text'}

Initially, I had some code like this iterating through the file, but my scope is wrong for some of the for-loops and I'm generating way too many dictionaries. Here's my initial code:

最初，我有一些这样的代码遍历文件，但我的范围对于某些 for 循环是错误的，并且我生成了太多字典。这是我的初始代码：

import xlrd
from xlrd import open_workbook, cellname

book = open('forum.xlsx', 'r')
sheet = book.sheet_by_index(3)

dict_list = []

for row_index in range(sheet.nrows):
    for col_index in range(sheet.ncols):
        d = {}

        # My intuition for the below for-loop is to take each cell in the top row of the 
        # Excel sheet and add it as a key to the dictionary, and then pass the value of 
        # current index in the above loops as the value to the dictionary. This isn't
        # working.

        for i in sheet.row(0):
           d[str(i)] = sheet.cell(row_index, col_index).value
           dlist.append(d)

Any help would be greatly appreciated. Thanks in advance for reading.

任何帮助将不胜感激。提前感谢您的阅读。

Answer 1

采纳答案by alecxe

The idea is to, first, read the header into the list. Then, iterate over the sheet rows (starting from the next after the header), create new dictionary based on header keys and appropriate cell values and append it to the list of dictionaries:

这个想法是，首先，将标题读入列表。然后，遍历工作表行（从标题后面的下一个开始），根据标题键和适当的单元格值创建新字典，并将其附加到字典列表中：

from xlrd import open_workbook

book = open_workbook('forum.xlsx')
sheet = book.sheet_by_index(3)

# read header values into the list    
keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)]

dict_list = []
for row_index in xrange(1, sheet.nrows):
    d = {keys[col_index]: sheet.cell(row_index, col_index).value 
         for col_index in xrange(sheet.ncols)}
    dict_list.append(d)

print dict_list

For a sheet containing:

对于包含以下内容的工作表：

A   B   C   D
1   2   3   4
5   6   7   8

it prints:

它打印：

[{'A': 1.0, 'C': 3.0, 'B': 2.0, 'D': 4.0}, 
 {'A': 5.0, 'C': 7.0, 'B': 6.0, 'D': 8.0}]

UPD (expanding the dictionary comprehension):

UPD（扩展词典理解）：

d = {}
for col_index in xrange(sheet.ncols):
    d[keys[col_index]] = sheet.cell(row_index, col_index).value

Answer 2

回答by user3203010

Try to first set up your keys by parsing just the first line, all columns, another function to parse the data, then call them in order.

尝试首先通过仅解析第一行、所有列、另一个解析数据的函数来设置您的键，然后按顺序调用它们。

all_fields_list = []
header_dict = {}
def parse_data_headers(sheet):
   global header_dict
   for c in range(sheet.ncols):
       key = sheet.cell(1, c) #here 1 is the row number where your header is
       header_dict[c] = key   #store it somewhere, here I have chosen to store in a dict
def parse_data(sheet):
   for r in range(2, sheet.nrows):
       row_dict = {}
       for c in range(sheet.ncols):
           value = sheet.cell(r,c)
           row_dict[c] = value
       all_fields_list.append(row_dict)

Answer 3

回答by user3203010

This answer helped me out a lot! I was fiddling with a way to do this for about two hours. Then I found this elegant and short answer. Thanks!

这个答案对我帮助很大！我摆弄了大约两个小时的方法来做到这一点。然后我找到了这个优雅而简短的答案。谢谢！

I needed some way to convert xls to json using keys.

我需要某种方式使用键将 xls 转换为 json。

So I adapted the script above with a json print statement like so:

所以我用 json 打印语句修改了上面的脚本，如下所示：

from xlrd import open_workbook
import simplejson as json
#http://stackoverflow.com/questions/23568409/xlrd-python-reading-excel-file-into-dict-with-for-loops?lq=1

book = open_workbook('makelijk-bomen-herkennen-schors.xls')
sheet = book.sheet_by_index(0)

# read header values into the list
keys = [sheet.cell(0, col_index).value for col_index in xrange(sheet.ncols)]
print "keys are", keys

dict_list = []
for row_index in xrange(1, sheet.nrows):
    d = {keys[col_index]: sheet.cell(row_index, col_index).value
         for col_index in xrange(sheet.ncols)}
    dict_list.append(d)

#print dict_list
j = json.dumps(dict_list)

# Write to file
with open('data.json', 'w') as f:
    f.write(j)

Answer 4

回答by yopiangi

Try this one. This function below will return generator contains dict of each row and column.

试试这个。下面的这个函数将返回生成器包含每行和每列的字典。

from xlrd import open_workbook

for row in parse_xlsx():
    print row # {id: 4, thread_id: 100, forum_id: 3, post_time: 1377000566, votes: 1, post_text: 'here is some text'}

def parse_xlsx():
    workbook = open_workbook('excelsheet.xlsx')
    sheets = workbook.sheet_names()
    active_sheet = workbook.sheet_by_name(sheets[0])
    num_rows = active_sheet.nrows
    num_cols = active_sheet.ncols
    header = [active_sheet.cell_value(0, cell).lower() for cell in range(num_cols)]
    for row_idx in xrange(1, num_rows):
        row_cell = [active_sheet.cell_value(row_idx, col_idx) for col_idx in range(num_cols)]
        yield dict(zip(header, row_cell))

Answer 5

回答by khelili miliana

This script allow you to transform a excel data to list of dictionnary

此脚本允许您将 excel 数据转换为字典列表

import xlrd

workbook = xlrd.open_workbook('forum.xls')
workbook = xlrd.open_workbook('forum.xls', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
        elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)
print data

Answer 6

回答by Kernel

from xlrd import open_workbook

dict_list = []
book = open_workbook('forum.xlsx')
sheet = book.sheet_by_index(3)

# read first row for keys  
keys = sheet.row_values(0)

# read the rest rows for values
values = [sheet.row_values(i) for i in range(1, sheet.nrows)]

for value in values:
    dict_list.append(dict(zip(keys, value)))

print dict_list

XLRD/Python：使用 for 循环将 Excel 文件读入 dict

提问by kylerthecreator

采纳答案by alecxe

回答by user3203010

回答by user3203010

回答by yopiangi

回答by khelili miliana

回答by Kernel

相关推荐

最近更新

标签

XLRD/Python：使用 for 循环将 Excel 文件读入 dict

提问by kylerthecreator

采纳答案by alecxe

回答by user3203010

回答by user3203010

回答by yopiangi

回答by khelili miliana

回答by Kernel

相关推荐

Python 通过 zbar 和 Raspicam 模块扫描二维码

Python 无法在 numpy.datetime64 上调用 strftime，没有定义

Python 如何使用 Pandas 的 DataFrame 计算百分比

Python-Requests，从字符串中提取url参数

相关推荐

最近更新

标签