在 Pandas 中,read_excel() 中使用的 read_csv() 中的“nrows”相当于什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35747476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In Pandas, whats the equivalent of 'nrows' from read_csv() to be used in read_excel()?
提问by Gabriel
Want to import only certain range of data from an excel spreadsheet (.xlsm format as it has macros) into a pandas dataframe. Was doing it this way:
只想将特定范围的数据从 excel 电子表格(.xlsm 格式,因为它具有宏)导入到 Pandas 数据框中。是这样做的:
data = pd.read_excel(filepath, header=0, skiprows=4, nrows= 20, parse_cols = "A:D")
But it seems that nrows works only with read_csv() ? What would be the equivalent for read_excel()?
但似乎 nrows 仅适用于 read_csv() ?read_excel() 的等价物是什么?
采纳答案by Erol
If you know the number of rows in your Excel sheet, you can use the skip_footerparameter to read the first n - skip_footerrows of your file, where nis the total number of rows.
如果您知道 Excel 工作表中的行数,则可以使用skip_footer参数读取文件的前n - skip_footer行,其中n是总行数。
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
Usage:
用法:
data = pd.read_excel(filepath, header=0, parse_cols = "A:D", skip_footer=80)
Assuming your excel sheet has 100 rows, this line would parse the first 20 rows.
假设您的 Excel 表有 100 行,该行将解析前 20 行。
回答by MaxU
I'd like to make (extend) @Erol's answerbit more flexible.
我想让(扩展)@Erol 的回答更灵活一些。
Assuming that we DON'T know the total number of rows in the excel sheet:
假设我们不知道excel表中的总行数:
xl = pd.ExcelFile(filepath)
# parsing first (index: 0) sheet
total_rows = xl.book.sheet_by_index(0).nrows
skiprows = 4
nrows = 20
# calc number of footer rows
# (-1) - for the header row
skipfooter = total_rows - nrows - skiprows - 1
df = xl.parse(0, skiprows=skiprows, skipfooter=skipfooter, parse_cols="A:D") \
.dropna(axis=1, how='all')
.dropna(axis=1, how='all')
will drop all columns containing onlyNaN
's
.dropna(axis=1, how='all')
将删除所有只包含NaN
's 的列
回答by NathanielDavidChu
As noted in the documentation, as of pandas version 0.23, this is now a built-in option, and functions almost exactly as the OP stated.
正如文档中所述,从 pandas 版本 0.23 开始,这现在是一个内置选项,其功能几乎与 OP 所述完全相同。
The code
编码
data = pd.read_excel(filepath, header=0, skiprows=4, nrows= 20, use_cols = "A:D")
data = pd.read_excel(filepath, header=0, skiprows=4, nrows= 20, use_cols = "A:D")
will now read the excel file, take data from the first sheet (default), skip 4 rows of data, thentake the first line (i.e., the fifth line of the sheet) as the header, read the next 20 rows of data into the dataframe (lines 6-25), and only use the columns A:D. Note that use_cols is now the final option, as parse_cols is deprecated.
现在将读取excel文件,从第一张工作表中取数据(默认),跳过4行数据,然后以第一行(即工作表的第五行)为标题,将接下来的20行数据读入数据框(第 6-25 行),并且仅使用 A:D 列。请注意, use_cols 现在是最后一个选项,因为 parse_cols 已被弃用。