如何使用 Pandas 从 Excel 中读取某些列 - Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33655127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:42:30  来源:igfitidea点击:

how to read certain columns from Excel using Pandas - Python

pythonnumpypandasdataframe

提问by Ana

I am reading from an Excel sheet and I want to read certain columns: column 0 because it is the row-index, and columns 22:37. Now here is what I do:

我正在从 Excel 工作表中读取数据,我想读取某些列:第 0 列因为它是行索引,以及第 22:37 列。现在这是我要做的:

import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = 37)
df= pd.concat([df[df.columns[0]], df[df.columns[22:]]], axis=1)

But I would hope there is better way to do that! I know if I do parse_cols=[0, 22,..,37]I can do it, but for large datasets this doesn't make sense.

但我希望有更好的方法来做到这一点!我知道如果我这样做parse_cols=[0, 22,..,37]我可以做到,但是对于大型数据集,这没有意义。

I also did this:

我也这样做了:

s = pd.Series(0)
s[1]=22
for i in range(2,14):
    s[i]=s[i-1]+1
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = s)

But it reads the first 15 columns which is the length of s.

但它读取前 15 列,即s.

回答by Martin Vseticka

You can use column indices (letters) like this:

您可以像这样使用列索引(字母):

import pandas as pd
import numpy as np
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols = "A,C:AA")
print(df)

[Corresponding documentation][1]:

[对应文件][1]:

usecolsint, str, list-like, or callable default None

  • If None, then parse all columns.
  • If str, then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides.
  • If list of int, then indicates list of column numbers to be parsed.
  • If list of string, then indicates list of column names to be parsed.

    New in version 0.24.0.

  • If callable, then evaluate each column name against it and parse the column if the callable returns True.

Returns a subset of the columns according to behavior above.

New in version 0.24.0.

usecolsint, str, list-like, or callable default None

  • 如果没有,则解析所有列。
  • 如果是 str,则表示 Excel 列字母和列范围的逗号分隔列表(例如“A:E”或“A,C,E:F”)。范围包括双方。
  • 如果是 int 列表,则表示要解析的列号列表。
  • 如果是字符串列表,则表示要解析的列名列表。

    0.24.0 版中的新功能。

  • 如果可调用,则根据它评估每个列名,如果可调用返回 True,则解析该列。

根据上述行为返回列的子集。

0.24.0 版中的新功能。

回答by Leoli

parse_colsis deprecated, use usecolsinstead

parse_cols已弃用,请usecols改用

that is:

那是:

df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], usecols = "A,C:AA")

回答by Uday Kiran

"usecols" should help, use range of columns (as per excel worksheet, A,B...etc.) below are the examples

“usecols”应该会有所帮助,使用列范围(根据excel工作表,A,B...等)下面是示例

  1. Selected Columns
  1. 选定的列
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A,C,F")
  1. Range of Columns and selected column
  1. 列范围和选定列
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H")
  1. Multiple Ranges
  1. 多个范围
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:F,H,J:N")
  1. Range of columns
  1. 列范围
df = pd.read_excel(file_location,sheet_name='Sheet1', usecols="A:N")