使用 Pandas 在 python 中读取 Excel 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17063458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading an Excel file in python using pandas
提问by Rakesh Adhikesavan
I am trying to read an excel file this way :
我正在尝试以这种方式读取 excel 文件:
newFile = pd.ExcelFile(PATH\FileName.xlsx)
ParsedData = pd.io.parsers.ExcelFile.parse(newFile)
which throws an error that says two arguments expected, I don't know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?
这会引发一个错误,指出预期有两个参数,我不知道第二个参数是什么,而且我在这里尝试实现的是将 Excel 文件转换为 DataFrame,我这样做是否正确?或者有没有其他方法可以使用熊猫来做到这一点?
采纳答案by DSM
Close: first you call ExcelFile, but then you call the .parsemethod and pass it the sheet name.
关闭:首先调用ExcelFile,然后调用.parse方法并将工作表名称传递给它。
>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
Tid dummy1 dummy2 dummy3 dummy4 dummy5 \
0 2006-09-01 00:00:00 0 5.894611 0.605211 3.842871 8.265307
1 2006-09-01 01:00:00 0 5.712107 0.605211 3.416617 8.301360
2 2006-09-01 02:00:00 0 5.105300 0.605211 3.090865 8.335395
3 2006-09-01 03:00:00 0 4.098209 0.605211 3.198452 8.170187
4 2006-09-01 04:00:00 0 3.338196 0.605211 2.970015 7.765058
dummy6 dummy7 dummy8 dummy9
0 0.623354 0 2.579108 2.681728
1 0.554211 0 7.210000 3.028614
2 0.567841 0 6.940000 3.644147
3 0.581470 0 6.630000 4.016155
4 0.595100 0 6.350000 3.974442
What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:
您正在做的是调用存在于类本身而不是实例上的方法,这没问题(虽然不是很惯用),但是如果您这样做,您还需要传递工作表名称:
>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)
回答by Dr Manhattan
Thought i should add here, that if you want to access rows or columns to loop through them, you do this:
想我应该在这里添加,如果你想访问行或列来循环它们,你可以这样做:
import pandas as pd
# open the file
xlsx = pd.ExcelFile(PATH\FileName.xlsx)
# get the first sheet as an object
sheet1 = xlsx.parse(0)
# get the first column as a list you can loop through
# where the is 0 in the code below change to the row or column number you want
column = sheet1.icol(0).real
# get the first row as a list you can loop through
row = sheet1.irow(0).real
Edit:
编辑:
The methods icol(i)and irow(i)are deprecated now. You can use sheet1.iloc[:,i]to get the i-th col and sheet1.iloc[i,:]to get the i-th row.
方法icol(i)和irow(i)现在已弃用。您可以使用sheet1.iloc[:,i]来获取第 i 列并sheet1.iloc[i,:]获取第 i 行。
回答by Murali Mopuru
This is much simple and easy way.
这是非常简单和容易的方法。
import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)
check out documentation full details http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.read_excel.html
查看文档完整详细信息 http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.read_excel.html
FutureWarning: The sheetnamekeyword is deprecated for newer Pandas versions, use sheet_nameinstead.
FutureWarning:该sheetname关键字已被较新的 Pandas 版本弃用,请sheet_name改用。
回答by Ajay Sant
I think this should satisfy your need:
我认为这应该可以满足您的需求:
import pandas as pd
# Read the excel sheet to pandas dataframe
DataFrame = pd.read_excel("PATH\FileName.xlsx", sheetname=0)
回答by ted
You just need to feed the path to your file to pd.read_excel
您只需要将文件路径提供给 pd.read_excel
import pandas as pd
file_path = "./my_excel.xlsx"
data_frame = pd.read_excel(file_path)
Checkout the documentationto explore parameters like skiprowsto ignore rows when loading the excel
查看文档以探索skiprows加载 excel 时忽略行等参数
回答by Suthura Sudharaka
import pandas as pd
data = pd.read_excel (r'**YourPath**.xlsx')
print (data)
回答by Quinn
Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.
这是一个更新的方法,其语法在 python 代码中更常见。它还可以防止您多次打开同一个文件。
import pandas as pd
sheet1, sheet2 = None, None
with pd.ExcelFile("PATH\FileName.xlsx") as reader:
sheet1 = pd.read_excel(reader, sheet_name='Sheet1')
sheet2 = pd.read_excel(reader, sheet_name='Sheet2')
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

