pandas 将excel中的某些列读取到数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/54106525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:15:41  来源:igfitidea点击:

Read certain column in excel to dataframe

pythonpandasdataframe

提问by Fadri

I want to read certain column from excel file into dataframe however I want to specify the column with its column header name.

我想将 Excel 文件中的某些列读取到数据框中,但是我想用列标题名称指定该列。

for an example, I have an excel file with two columns in Sheet 2: "number" in column A and "ForeignKey" in column B). I want to import the "ForeignKey" into a dataframe. I did this with the following script:

例如,我有一个 Excel 文件,在第 2 表中有两列:A 列中的“数字”和 B 列中的“外键”)。我想将“外键”导入到数据框中。我使用以下脚本执行此操作:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=[0,1]) 

It shows the following in my xl_file:

它在我的 xl_file 中显示以下内容:

       number ForeignKey
0       1        abc
1       2        def
2       3        ghi

in case a small number of column, I can get the "ForeignKey" by specifying usecols=[1]. However if I have many column and know the column name pattern, it will be easier by specifying the column name. I tried the following code but it gives empty dataframe.

如果列数较少,我可以通过指定usecols=[1]. 但是,如果我有很多列并且知道列名模式,那么通过指定列名会更容易。我尝试了以下代码,但它给出了空数据框。

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=['ForeignKey']) 

According to discussion in the following link, the code above works well but for read_csv.

根据以下链接中的讨论,上面的代码运行良好,但对于read_csv.

[How to drop a specific column of csv file while reading it using pandas?

[如何在使用 Pandas 读取 csv 文件时删除它的特定列?

Is there a way to do this for reading excel file?

有没有办法做到这一点来读取excel文件?

thank you in advance

先感谢您

采纳答案by Frayal

there is a solution but csv are not treated the same way excel does.

有一个解决方案,但 csv 的处理方式与 excel 不同。

from documentation, for csv:

来自文档,对于 csv:

usecols : list-like or callable, default None

For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo', ‘bar', ‘baz'].

usecols : 类似列表或可调用,默认无

例如,一个有效的类似列表的 usecols 参数将是 [0, 1, 2] 或 ['foo', 'bar', 'baz']。

for excel:

对于excel:

usecols : int or list, default None

  • If None then parse all columns,
  • If int then indicates last column to be parsed
  • If list of ints then indicates list of column numbers to be parsed
  • If string then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides

usecols : int 或 list,默认无

  • 如果 None 则解析所有列,
  • 如果 int 则表示要解析的最后一列
  • 如果整数列表则指示要解析的列号列表
  • 如果字符串则表示 Excel 列字母和列范围的逗号分隔列表(例如“A:E”或“A,C,E:F”)。范围包括双方

so you need to call it like this:

所以你需要这样称呼它:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='ForeignKey')

and if you need also 'number':

如果您还需要'number'

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='number,ForeignKey')

EDIT: you need to put the name of the excel column not the name of the data. the other answer solve this. however you won't need 'B:B', 'B' will do the trick BUTthat won't improve the usecols with numbers.

编辑:您需要输入 excel 列的名称而不是数据的名称。另一个答案解决了这个问题。但是你不需要'B:B','B'会做的伎俩不会用数字改善usecols。

if you can load all the datas in not time maybe the best way to solve this is to parse all columns and then select the desired columns:

如果您可以立即加载所有数据,那么解决此问题的最佳方法是解析所有列,然后选择所需的列:

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2')['ForeignKey']

回答by meW

You need to pass excel column name, that too in a format of range e.g. colname:colname.

您需要以范围格式传递 excel 列名,例如colname:colname.

For instance, if the ForeignKeyappears in column Bof your excel sheet 2, then do -

例如,如果ForeignKey出现在B您的 Excel 表 2 的列中,则执行 -

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='B:B') 

Refer to Github issueand prescribed solution for the same.

请参阅 Github问题和规定的解决方案。