pandas 将excel中的某些列读取到数据框

Question

提问by Fadri

I want to read certain column from excel file into dataframe however I want to specify the column with its column header name.

我想将 Excel 文件中的某些列读取到数据框中，但是我想用列标题名称指定该列。

for an example, I have an excel file with two columns in Sheet 2: "number" in column A and "ForeignKey" in column B). I want to import the "ForeignKey" into a dataframe. I did this with the following script:

例如，我有一个 Excel 文件，在第 2 表中有两列：A 列中的“数字”和 B 列中的“外键”）。我想将“外键”导入到数据框中。我使用以下脚本执行此操作：

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=[0,1])

It shows the following in my xl_file:

它在我的 xl_file 中显示以下内容：

       number ForeignKey
0       1        abc
1       2        def
2       3        ghi

in case a small number of column, I can get the "ForeignKey" by specifying usecols=[1]. However if I have many column and know the column name pattern, it will be easier by specifying the column name. I tried the following code but it gives empty dataframe.

如果列数较少，我可以通过指定usecols=[1]. 但是，如果我有很多列并且知道列名模式，那么通过指定列名会更容易。我尝试了以下代码，但它给出了空数据框。

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols=['ForeignKey'])

According to discussion in the following link, the code above works well but for read_csv.

根据以下链接中的讨论，上面的代码运行良好，但对于read_csv.

[How to drop a specific column of csv file while reading it using pandas?

[如何在使用 Pandas 读取 csv 文件时删除它的特定列？

Is there a way to do this for reading excel file?

有没有办法做到这一点来读取excel文件？

thank you in advance

先感谢您

Answer 1

采纳答案by Frayal

there is a solution but csv are not treated the same way excel does.

有一个解决方案，但 csv 的处理方式与 excel 不同。

from documentation, for csv:

来自文档，对于 csv：

usecols : list-like or callable, default None
For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo', ‘bar', ‘baz'].

usecols : 类似列表或可调用，默认无
例如，一个有效的类似列表的 usecols 参数将是 [0, 1, 2] 或 ['foo', 'bar', 'baz']。

for excel:

对于excel：

usecols : int or list, default None
If None then parse all columns,
If int then indicates last column to be parsed
If list of ints then indicates list of column numbers to be parsed
If string then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides

usecols : int 或 list，默认无
如果 None 则解析所有列，
如果 int 则表示要解析的最后一列
如果整数列表则指示要解析的列号列表
如果字符串则表示 Excel 列字母和列范围的逗号分隔列表（例如“A:E”或“A,C,E:F”）。范围包括双方

so you need to call it like this:

所以你需要这样称呼它：

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='ForeignKey')

and if you need also 'number':

如果您还需要'number'：

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='number,ForeignKey')

EDIT: you need to put the name of the excel column not the name of the data. the other answer solve this. however you won't need 'B:B', 'B' will do the trick BUTthat won't improve the usecols with numbers.

编辑：您需要输入 excel 列的名称而不是数据的名称。另一个答案解决了这个问题。但是你不需要'B：B'，'B'会做的伎俩但不会用数字改善usecols。

if you can load all the datas in not time maybe the best way to solve this is to parse all columns and then select the desired columns:

如果您可以立即加载所有数据，那么解决此问题的最佳方法是解析所有列，然后选择所需的列：

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2')['ForeignKey']

Answer 2

回答by meW

You need to pass excel column name, that too in a format of range e.g. colname:colname.

您需要以范围格式传递 excel 列名，例如colname:colname.

For instance, if the ForeignKeyappears in column Bof your excel sheet 2, then do -

例如，如果ForeignKey出现在B您的 Excel 表 2 的列中，则执行 -

xl_file = pd.read_excel('D:/SnapPython/TestDF.xlsx', sheet_name='Sheet 2', usecols='B:B')

Refer to Github issueand prescribed solution for the same.

请参阅 Github问题和规定的解决方案。

pandas 将excel中的某些列读取到数据框

提问by Fadri

采纳答案by Frayal

回答by meW

相关推荐

最近更新

标签

pandas 将excel中的某些列读取到数据框

提问by Fadri

采纳答案by Frayal

回答by meW

相关推荐

在 Pandas 数据框中用 NaN 替换字符串值 - Python

Pandas .at 抛出 ValueError: At 基于整数索引的索引只能有整数索引器

带有 Pandas 的高性能笛卡尔积（CROSS JOIN）

AttributeError: 模块“pandas”没有属性“DataFrame”

相关推荐

最近更新

标签