使用 Pandas 在 python 中读取 Excel 文件

Question

提问by Rakesh Adhikesavan

I am trying to read an excel file this way :

我正在尝试以这种方式读取 excel 文件：

newFile = pd.ExcelFile(PATH\FileName.xlsx)
ParsedData = pd.io.parsers.ExcelFile.parse(newFile)

which throws an error that says two arguments expected, I don't know what the second argument is and also what I am trying to achieve here is to convert an Excel file to a DataFrame, Am I doing it the right way? or is there any other way to do this using pandas?

这会引发一个错误，指出预期有两个参数，我不知道第二个参数是什么，而且我在这里尝试实现的是将 Excel 文件转换为 DataFrame，我这样做是否正确？或者有没有其他方法可以使用熊猫来做到这一点？

Answer 1

采纳答案by DSM

Close: first you call ExcelFile, but then you call the .parsemethod and pass it the sheet name.

关闭：首先调用ExcelFile，然后调用.parse方法并将工作表名称传递给它。

>>> xl = pd.ExcelFile("dummydata.xlsx")
>>> xl.sheet_names
[u'Sheet1', u'Sheet2', u'Sheet3']
>>> df = xl.parse("Sheet1")
>>> df.head()
                  Tid  dummy1    dummy2    dummy3    dummy4    dummy5  \
0 2006-09-01 00:00:00       0  5.894611  0.605211  3.842871  8.265307   
1 2006-09-01 01:00:00       0  5.712107  0.605211  3.416617  8.301360   
2 2006-09-01 02:00:00       0  5.105300  0.605211  3.090865  8.335395   
3 2006-09-01 03:00:00       0  4.098209  0.605211  3.198452  8.170187   
4 2006-09-01 04:00:00       0  3.338196  0.605211  2.970015  7.765058   

     dummy6  dummy7    dummy8    dummy9  
0  0.623354       0  2.579108  2.681728  
1  0.554211       0  7.210000  3.028614  
2  0.567841       0  6.940000  3.644147  
3  0.581470       0  6.630000  4.016155  
4  0.595100       0  6.350000  3.974442

What you're doing is calling the method which lives on the class itself, rather than the instance, which is okay (although not very idiomatic), but if you're doing that you would also need to pass the sheet name:

您正在做的是调用存在于类本身而不是实例上的方法，这没问题（虽然不是很惯用），但是如果您这样做，您还需要传递工作表名称：

>>> parsed = pd.io.parsers.ExcelFile.parse(xl, "Sheet1")
>>> parsed.columns
Index([u'Tid', u'dummy1', u'dummy2', u'dummy3', u'dummy4', u'dummy5', u'dummy6', u'dummy7', u'dummy8', u'dummy9'], dtype=object)

Answer 2

回答by Dr Manhattan

Thought i should add here, that if you want to access rows or columns to loop through them, you do this:

想我应该在这里添加，如果你想访问行或列来循环它们，你可以这样做：

import pandas as pd

# open the file
xlsx = pd.ExcelFile(PATH\FileName.xlsx)

# get the first sheet as an object
sheet1 = xlsx.parse(0)

# get the first column as a list you can loop through
# where the is 0 in the code below change to the row or column number you want    
column = sheet1.icol(0).real

# get the first row as a list you can loop through
row = sheet1.irow(0).real

Edit:

编辑：

The methods icol(i)and irow(i)are deprecated now. You can use sheet1.iloc[:,i]to get the i-th col and sheet1.iloc[i,:]to get the i-th row.

方法icol(i)和irow(i)现在已弃用。您可以使用sheet1.iloc[:,i]来获取第 i 列并sheet1.iloc[i,:]获取第 i 行。

Answer 3

回答by Murali Mopuru

This is much simple and easy way.

这是非常简单和容易的方法。

import pandas
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname='Sheet 1')
# or using sheet index starting 0
df = pandas.read_excel(open('your_xls_xlsx_filename','rb'), sheetname=2)

check out documentation full details http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.read_excel.html

查看文档完整详细信息 http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.read_excel.html

FutureWarning: The sheetnamekeyword is deprecated for newer Pandas versions, use sheet_nameinstead.

FutureWarning：该sheetname关键字已被较新的 Pandas 版本弃用，请sheet_name改用。

Answer 4

回答by Ajay Sant

I think this should satisfy your need:

我认为这应该可以满足您的需求：

import pandas as pd

# Read the excel sheet to pandas dataframe
DataFrame = pd.read_excel("PATH\FileName.xlsx", sheetname=0)

Answer 5

回答by ted

You just need to feed the path to your file to pd.read_excel

您只需要将文件路径提供给 pd.read_excel

import pandas as pd

file_path = "./my_excel.xlsx"
data_frame = pd.read_excel(file_path)

Checkout the documentationto explore parameters like skiprowsto ignore rows when loading the excel

查看文档以探索skiprows加载 excel 时忽略行等参数

Answer 6

回答by Suthura Sudharaka

import pandas as pd

data = pd.read_excel (r'**YourPath**.xlsx')

print (data)

Answer 7

回答by Quinn

Here is an updated method with syntax that is more common in python code. It also prevents you from opening the same file multiple times.

这是一个更新的方法，其语法在 python 代码中更常见。它还可以防止您多次打开同一个文件。

import pandas as pd

sheet1, sheet2 = None, None
with pd.ExcelFile("PATH\FileName.xlsx") as reader:
    sheet1 = pd.read_excel(reader, sheet_name='Sheet1')
    sheet2 = pd.read_excel(reader, sheet_name='Sheet2')

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

使用 Pandas 在 python 中读取 Excel 文件

提问by Rakesh Adhikesavan

采纳答案by DSM

回答by Dr Manhattan

回答by Murali Mopuru

回答by Ajay Sant

回答by ted

回答by Suthura Sudharaka

回答by Quinn

相关推荐

最近更新

标签

使用 Pandas 在 python 中读取 Excel 文件

提问by Rakesh Adhikesavan

采纳答案by DSM

回答by Dr Manhattan

回答by Murali Mopuru

回答by Ajay Sant

回答by ted

回答by Suthura Sudharaka

回答by Quinn

相关推荐

Python 类型错误：列表索引必须是整数，而不是字典

Python 将 Excel 文件加载到 numpy 二维数组中

Python 你如何捕捉这个异常？

Python 如何在numpy数组列中找到最大值？

相关推荐

最近更新

标签