从受密码保护的 Excel 文件到 Pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15285068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:41:32  来源:igfitidea点击:

From password-protected Excel file to pandas DataFrame

pythonexcelpandas

提问by dmvianna

I can open a password-protected Excel file with this:

我可以使用以下命令打开受密码保护的 Excel 文件:

import sys
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
print "Excel library version:", xlApp.Version
filename, password = sys.argv[1:3]
xlwb = xlApp.Workbooks.Open(filename, Password=password)
# xlwb = xlApp.Workbooks.Open(filename)
xlws = xlwb.Sheets(1) # counts from 1, not from 0
print xlws.Name
print xlws.Cells(1, 1) # that's A1

I'm not sure though how to transfer the information to a pandas dataframe. Do I need to read cells one by one and all, or is there a convenient method for this to happen?

我不确定如何将信息传输到Pandas数据框。我是否需要一个一个地读取单元格,或者是否有一种方便的方法可以做到这一点?

采纳答案by ikeoddy

Assuming the starting cell is given as (StartRow, StartCol) and the ending cell is given as (EndRow, EndCol), I found the following worked for me:

假设起始单元格指定为 (StartRow, StartCol),结束单元格指定为 (EndRow, EndCol),我发现以下内容对我有用:

# Get the content in the rectangular selection region
# content is a tuple of tuples
content = xlws.Range(xlws.Cells(StartRow, StartCol), xlws.Cells(EndRow, EndCol)).Value 

# Transfer content to pandas dataframe
dataframe = pandas.DataFrame(list(content))

Note: Excel Cell B5 is given as row 5, col 2 in win32com. Also, we need list(...) to convert from tuple of tuples to list of tuples, since there is no pandas.DataFrame constructor for a tuple of tuples.

注意:Excel 单元格 B5 在 win32com 中作为第 5 行,第 2 列给出。此外,我们需要 list(...) 将元组元组转换为元组列表,因为没有用于元组元组的 pandas.DataFrame 构造函数。

回答by datalifenyc

Based on the suggestion provided by @ikeoddy, this should put the pieces together:

根据@ikeoddy 提供的建议,这应该将各个部分放在一起:

How to open a password protected excel file using python?

如何使用python打开受密码保护的excel文件?

# Import modules
import pandas as pd
import win32com.client
import os
import getpass

# Name file variables
file_path = r'your_file_path'
file_name = r'your_file_name.extension'

full_name = os.path.join(file_path, file_name)
# print(full_name)

Getting command-line password input in Python

在 Python 中获取命令行密码输入

# You are prompted to provide the password to open the file
xl_app = win32com.client.Dispatch('Excel.Application')
pwd = getpass.getpass('Enter file password: ')

Workbooks.Open Method (Excel)

Workbooks.Open 方法 (Excel)

xl_wb = xl_app.Workbooks.Open(full_name, False, True, None, pwd)
xl_app.Visible = False
xl_sh = xl_wb.Worksheets('your_sheet_name')

# Get last_row
row_num = 0
cell_val = ''
while cell_val != None:
    row_num += 1
    cell_val = xl_sh.Cells(row_num, 1).Value
    # print(row_num, '|', cell_val, type(cell_val))
last_row = row_num - 1
# print(last_row)

# Get last_column
col_num = 0
cell_val = ''
while cell_val != None:
    col_num += 1
    cell_val = xl_sh.Cells(1, col_num).Value
    # print(col_num, '|', cell_val, type(cell_val))
last_col = col_num - 1
# print(last_col)

ikeoddy's answer:

ikeoddy 的回答:

content = xl_sh.Range(xl_sh.Cells(1, 1), xl_sh.Cells(last_row, last_col)).Value
# list(content)
df = pd.DataFrame(list(content[1:]), columns=content[0])
df.head()

python win32 COM closing excel workbook

python win32 COM关闭excel工作簿

xl_wb.Close(False)

回答by Maurice

from David Hamann's site (all credits go to him) https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/

来自大卫哈曼的网站(所有学分归他所有) https://davidhamann.de/2018/02/21/read-password-protected-excel-files-into-pandas-dataframe/

Use xlwings, opening the file will first launch the Excel application so you can enter the password.

使用 xlwings,打开文件将首先启动 Excel 应用程序,以便您可以输入密码。

import pandas as pd
import xlwings as xw

PATH = '/Users/me/Desktop/xlwings_sample.xlsx'
wb = xw.Book(PATH)
sheet = wb.sheets['sample']

df = sheet['A1:C4'].options(pd.DataFrame, index=False, header=True).value
df

回答by Phillip Cloud

Assuming that you can save the encrypted file back to disk using the win32com API (which I realize might defeat the purpose) you could then immediately call the top-level pandas function read_excel. You'll need to install some combination of xlrd(for Excel 2003), xlwt(also for 2003), and openpyxl(for Excel 2007) first though. Hereis the documentation for reading in Excel files. Currently pandas does not provide support for using the win32com API to read Excel files. You're welcome to open up a GitHub issueif you'd like.

假设您可以使用 win32com API(我意识到这可能会破坏目的)将加密文件保存回磁盘,然后您可以立即调用顶级 Pandas 函数read_excel。不过,您首先需要安装xlrd(对于 Excel 2003)、xlwt(也对于 2003)和openpyxl(对于 Excel 2007)的某种组合。是用于读取 Excel 文件的文档。目前pandas 不支持使用win32com API 读取Excel 文件。如果您愿意,欢迎您打开 GitHub 问题