Python 使用 Pandas 对同一工作簿的多个工作表进行 pd.read_excel()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26521266/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:35:31  来源:igfitidea点击:

Using Pandas to pd.read_excel() for multiple worksheets of the same workbook

pythonexcelpandasdataframe

提问by HaPsantran

I have a large spreadsheet file (.xlsx) that I'm processing using python pandas. It happens that I need data from two tabs in that large file. One of the tabs has a ton of data and the other is just a few square cells.

我有一个大型电子表格文件 (.xlsx),我正在使用 python pandas 进行处理。碰巧我需要来自该大文件中两个选项卡的数据。其中一个选项卡有大量数据,另一个只有几个方形单元格。

When I use pd.read_excel()on anyworksheet, it looks to me like the whole file is loaded (not just the worksheet I'm interested in). So when I use the method twice (once for each sheet), I effectively have to suffer the whole workbook being read in twice (even though we're only using the specified sheet).

当我在任何工作表上使用pd.read_excel()时,在我看来就像加载了整个文件(不仅仅是我感兴趣的工作表)。因此,当我使用该方法两次(每张纸一次)时,我实际上不得不忍受整个工作簿被读取两次(即使我们只使用指定的工作表)。

Am I using it wrong or is it just limited in this way?

我使用它是错误的还是只是以这种方式受到限制?

Thank you!

谢谢!

采纳答案by Noah

Try pd.ExcelFile:

尝试pd.ExcelFile

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile()call (there doesn't appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.

正如@HaPsantran 所指出的,在ExcelFile()通话期间会读入整个 Excel 文件(似乎没有办法解决这个问题)。这只是使您不必在每次要访问新工作表时读取同一个文件。

Note that the sheet_nameargument to pd.read_excel()can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0).

请注意,sheet_name参数 topd.read_excel()可以是工作表的名称(如上)、指定工作表编号的整数(例如 0、1 等)、工作表名称或索引的列表,或None. 如果提供了一个列表,它会返回一个字典,其中键是工作表名称/索引,值是数据框。默认是简单地返回第一张纸(即,sheet_name=0)。

If Noneis specified, allsheets are returned, as a {sheet_name:dataframe}dictionary.

如果None指定,则返回所有工作表,作为{sheet_name:dataframe}字典。

回答by Elliott

You can also use the index for the sheet:

您还可以使用工作表的索引:

xls = pd.ExcelFile('path_to_file.xls')
sheet1 = xls.parse(0)

will give the first worksheet. for the second worksheet:

将给出第一个工作表。对于第二个工作表:

sheet2 = xls.parse(1)

回答by Mat0kan

You could also specify the sheet name as a parameter:

您还可以将工作表名称指定为参数:

data_file = pd.read_excel('path_to_file.xls', sheet_name="sheet_name")

will upload only the sheet "sheet_name".

将只上传工作表"sheet_name"

回答by Vikash Singh

There are 3 options:

有3个选项:

Read all sheets directly into an ordered dictionary.

将所有工作表直接读入有序字典中。

import pandas as pd

# for pandas version >= 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheet_name=None)

# for pandas version < 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheetname=None)

Thanks @ihightower for pointing it out and @toto_tico for pointing out the version issue.

感谢@ihightower 指出它并感谢@toto_tico 指出版本问题。

Read the first sheet directly into dataframe

将第一张表直接读入数据帧

df = pd.read_excel('excel_file_path.xls')
# this will read the first sheet into df

Read the excel file and get a list of sheets. Then chose and load the sheets.

阅读 excel 文件并获取工作表列表。然后选择并加载纸张。

xls = pd.ExcelFile('excel_file_path.xls')

# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]

# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheetname="house")

Read all sheets and store it in a dictionary. Same as first but more explicit.

阅读所有工作表并将其存储在字典中。与第一个相同,但更明确。

# to read all sheets to a map
sheet_to_df_map = {}
for sheet_name in xls.sheet_names:
    sheet_to_df_map[sheet_name] = xls.parse(sheet_name)

Update: Thanks @toto_tico for pointing out the version issue.

更新:感谢@toto_tico 指出版本问题。

sheetname : string, int, mixed list of strings/ints, or None, default 0 Deprecated since version 0.21.0: Use sheet_name instead Source Link

sheetname :字符串,整数,字符串/整数的混合列表,或无,默认值 0 自 0.21.0 版起已弃用:使用 sheet_name 代替源链接

回答by citynorman

Yes unfortunately it will always load the full file. If you're doing this repeatedly probably best to extract the sheets to separate CSVs and then load separately. You can automate that process with d6tstackwhich also adds additional features like checking if all the columns are equal across all sheets or multiple Excel files.

是的,不幸的是它总是会加载完整的文件。如果您重复执行此操作,最好将工作表提取为单独的 CSV,然后单独加载。您可以使用d6tstack自动执行该过程,它还添加了其他功能,例如检查所有工作表或多个 Excel 文件中的所有列是否相等。

import d6tstack
c = d6tstack.convert_xls.XLStoCSVMultiSheet('multisheet.xlsx')
c.convert_all() # ['multisheet-Sheet1.csv','multisheet-Sheet2.csv']

See d6tstack Excel examples

请参阅d6tstack Excel 示例

回答by Ashu007

pd.read_excel('filename.xlsx') 

by default read the first sheet of workbook.

默认情况下阅读工作簿的第一页。

pd.read_excel('filename.xlsx', sheet_name = 'sheetname') 

read the specific sheet of workbook and

阅读特定的工作簿表和

pd.read_excel('filename.xlsx', sheet_name = None) 

read all the worksheets from excel to pandas dataframe as a type of OrderedDict means nested dataframes, all the worksheets as dataframes collected inside dataframe and it's type is OrderedDict.

将所有工作表从 excel 读取到 Pandas 数据帧,作为一种 OrderedDict 类型意味着嵌套数据帧,所有工作表作为数据帧收集的数据帧,它的类型是 OrderedDict。

回答by Nikita Agarwala

If you have saved the excel file in the same folder as your python program(Relative Addressing) then you just need to mention sheet number along with file name. Syntax=pd.read_excel(Filename,SheetNo) Example:

如果您已将 excel 文件保存在与您的 python 程序(相对寻址)相同的文件夹中,那么您只需要提及工作表编号和文件名。语法=pd.read_excel(Filename,SheetNo) 示例:

    data=pd.read_excel("wt_vs_ht.xlsx","Sheet2")
    print(data)
    x=data.Height
    y=data.Weight
    plt.plot(x,y,'x')
    plt.show()