如何使用 Pandas 从 Excel 中读取某些列 - Python
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33655127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to read certain columns from Excel using Pandas - Python
提问by Ana
I am reading from an Excel sheet and I want to read certain columns: column 0 because it is the row-index, and columns 22:37. Now here is what I do:
我正在从 Excel 工作表中读取数据,我想读取某些列:第 0 列因为它是行索引,以及第 22:37 列。现在这是我要做的:
import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = 37)
df= pd.concat([df[df.columns[0]], df[df.columns[22:]]], axis=1)
But I would hope there is better way to do that! I know if I do parse_cols=[0, 22,..,37]
I can do it, but for large datasets this doesn't make sense.
但我希望有更好的方法来做到这一点!我知道如果我这样做parse_cols=[0, 22,..,37]
我可以做到,但是对于大型数据集,这没有意义。
I also did this:
我也这样做了:
s = pd.Series(0)
s[1]=22
for i in range(2,14):
s[i]=s[i-1]+1
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = s)
But it reads the first 15 columns which is the length of s
.
但它读取前 15 列,即s
.
回答by Martin Vseticka
You can use column indices (letters) like this:
您可以像这样使用列索引(字母):
import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols = "A,C:AA")
print(df)
[Corresponding documentation][1]:
[对应文件][1]:
usecolsint, str, list-like, or callable default None
- If None, then parse all columns.
- If str, then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides.
- If list of int, then indicates list of column numbers to be parsed.
If list of string, then indicates list of column names to be parsed.
New in version 0.24.0.
If callable, then evaluate each column name against it and parse the column if the callable returns True.
Returns a subset of the columns according to behavior above.
New in version 0.24.0.
usecolsint, str, list-like, or callable default None
- 如果没有,则解析所有列。
- 如果是 str,则表示 Excel 列字母和列范围的逗号分隔列表(例如“A:E”或“A,C,E:F”)。范围包括双方。
- 如果是 int 列表,则表示要解析的列号列表。
如果是字符串列表,则表示要解析的列名列表。
0.24.0 版中的新功能。
如果可调用,则根据它评估每个列名,如果可调用返回 True,则解析该列。
根据上述行为返回列的子集。
0.24.0 版中的新功能。
回答by Leoli
parse_cols
is deprecated, use usecols
instead
parse_cols
已弃用,请usecols
改用
that is:
那是:
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols = "A,C:AA")
回答by Uday Kiran
"usecols" should help, use range of columns (as per excel worksheet, A,B...etc.) below are the examples
“usecols”应该会有所帮助,使用列范围(根据excel工作表,A,B...等)下面是示例
- Selected Columns
- 选定的列
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A,C,F")
- Range of Columns and selected column
- 列范围和选定列
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H")
- Multiple Ranges
- 多个范围
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H,J:N")
- Range of columns
- 列范围
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:N")