使用 Pandas 在 python 中读取 Excel 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17063458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:20:14  来源:igfitidea点击:

Reading an Excel file in python using pandas

pythonpython-2.7pandas

提问by Rakesh Adhikesavan

I am trying to read an excel file this way :

我正在尝试以这种方式读取 excel 文件:

newFile = pd.ExcelFile(PATH\FileName.xlsx)
ParsedData = pd.io.parsers.ExcelFile.parse(newFile)

which throws an error that says two arguments expected, I don't know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?

这会引发一个错误,指出预期有两个参数,我不知道第二个参数是什么,而且我在这里尝试实现的是将 Excel 文件转换为 DataFrame,我这样做是否正确?或者有没有其他方法可以使用熊猫来做到这一点?

采纳答案by DSM

Close: first you call ExcelFile, but then you call the .parsemethod and pass it the sheet name.

关闭:首先调用ExcelFile,然后调用.parse方法并将工作表名称传递给它。

>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
                  Tid  dummy1    dummy2    dummy3    dummy4    dummy5  \
0 2006-09-01 00:00:00       0  5.894611  0.605211  3.842871  8.265307   
1 2006-09-01 01:00:00       0  5.712107  0.605211  3.416617  8.301360   
2 2006-09-01 02:00:00       0  5.105300  0.605211  3.090865  8.335395   
3 2006-09-01 03:00:00       0  4.098209  0.605211  3.198452  8.170187   
4 2006-09-01 04:00:00       0  3.338196  0.605211  2.970015  7.765058   

     dummy6  dummy7    dummy8    dummy9  
0  0.623354       0  2.579108  2.681728  
1  0.554211       0  7.210000  3.028614  
2  0.567841       0  6.940000  3.644147  
3  0.581470       0  6.630000  4.016155  
4  0.595100       0  6.350000  3.974442  

What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:

您正在做的是调用存在于类本身而不是实例上的方法,这没问题(虽然不是很惯用),但是如果您这样做,您还需要传递工作表名称:

>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)

回答by Dr Manhattan

Thought i should add here, that if you want to access rows or columns to loop through them, you do this:

想我应该在这里添加,如果你想访问行或列来循环它们,你可以这样做:

import pandas as pd

# open the file
xlsx = pd.ExcelFile(PATH\FileName.xlsx)

# get the first sheet as an object
sheet1 = xlsx.parse(0)

# get the first column as a list you can loop through
# where the is 0 in the code below change to the row or column number you want    
column = sheet1.icol(0).real

# get the first row as a list you can loop through
row = sheet1.irow(0).real


Edit:

编辑:

The methods icol(i)and irow(i)are deprecated now. You can use sheet1.iloc[:,i]to get the i-th col and sheet1.iloc[i,:]to get the i-th row.

方法icol(i)irow(i)现在已弃用。您可以使用sheet1.iloc[:,i]来获取第 i 列并sheet1.iloc[i,:]获取第 i 行。

回答by Murali Mopuru

This is much simple and easy way.

这是非常简单和容易的方法。

import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)

check out documentation full details http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.read_excel.html

查看文档完整详细信息 http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.read_excel.html

FutureWarning: The sheetnamekeyword is deprecated for newer Pandas versions, use sheet_nameinstead.

FutureWarning:该sheetname关键字已被较新的 Pandas 版本弃用,请sheet_name改用。

回答by Ajay Sant

I think this should satisfy your need:

我认为这应该可以满足您的需求:

import pandas as pd

# Read the excel sheet to pandas dataframe
DataFrame = pd.read_excel("PATH\FileName.xlsx", sheetname=0)

回答by ted

You just need to feed the path to your file to pd.read_excel

您只需要将文件路径提供给 pd.read_excel

import pandas as pd

file_path = "./my_excel.xlsx"
data_frame = pd.read_excel(file_path)

Checkout the documentationto explore parameters like skiprowsto ignore rows when loading the excel

查看文档以探索skiprows加载 excel 时忽略行等参数

回答by Suthura Sudharaka

import pandas as pd

data = pd.read_excel (r'**YourPath**.xlsx')

print (data)

回答by Quinn

Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.

这是一个更新的方法,其语法在 python 代码中更常见。它还可以防止您多次打开同一个文件。

import pandas as pd

sheet1, sheet2 = None, None
with pd.ExcelFile("PATH\FileName.xlsx") as reader:
    sheet1 = pd.read_excel(reader, sheet_name='Sheet1')
    sheet2 = pd.read_excel(reader, sheet_name='Sheet2')

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html