Python 如何在使用 Pandas 读取 Excel 文件时指定列名？

Question

提问by Rakesh Adhikesavan

I read an Excel Sheet into a pandas DataFrame this way:

我通过这种方式将 Excel 工作表读入了熊猫 DataFrame：

import pandas as pd

xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet1")

the first cell's value of each column is selected as the column name for the dataFrame, I want to specify my own column names, How do I do this?

每列的第一个单元格的值被选为数据框的列名，我想指定我自己的列名，我该怎么做？

Answer 1

采纳答案by falsetru

call .parsewith header=Nonekeyword argument.

打电话.parse与header=None关键字参数。

df = xl.parse("Sheet1", header=None)

Answer 2

回答by Rutger Kassies

I think setting them afterwards is the only way in this case, so if you have for example four columns in your DataFrame:

我认为在这种情况下之后设置它们是唯一的方法，所以如果你的 DataFrame 中有四列：

df.columns = ['W','X','Y','Z']

If you know in advance what the headers in the Excelfile are its probably better to rename them, this would rename W into A, etc:

如果您事先知道 Excelfile 中的标题是什么，重命名它们可能更好，这会将 W 重命名为 A 等：

df.rename(columns={'W':'A', 'X':'B', etc})

Answer 3

回答by ram

This thread is 5 years old and outdated now, but still shows up on the top of the list from a generic search. So I am adding this note. Pandas now (v0.22) has a keyword to specify column names at parsing Excel files. Use:

该线程已有 5 年历史，现已过时，但仍显示在一般搜索列表的顶部。所以我添加了这个注释。Pandas 现在（v0.22）有一个关键字来在解析 Excel 文件时指定列名。用：

import pandas as pd
xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet 1", header=None, names=['A', 'B', 'C'])

If header=None is not set, pd seems to consider the first row as header and delete it during parsing. If there is indeed a header, but you dont want to use it, you have two choices, either (1) use "names" kwarg only; or (2) use "names" with header=None and skiprows=1. I personally prefer the second option, since it clearly makes note that the input file is not in the format I want, and that I am doing something to go around it.

如果 header=None 没有设置，pd 似乎将第一行视为标题并在解析过程中将其删除。如果确实有标题，但您不想使用它，则有两种选择，要么（1）仅使用“名称”kwarg；或 (2) 使用 header=None 和 skiprows=1 的“名称”。我个人更喜欢第二个选项，因为它清楚地表明输入文件不是我想要的格式，并且我正在做一些事情来解决它。

Answer 4

回答by Loku

As Ram said, this post comes on the top and may be useful to some.... In pandas 0.24.2 (may be earlier as well), read_excel itself has the capability of ignoring the source headers and giving your own col names and few other good controls:

正如 Ram 所说，这篇文章排在最前面，可能对某些人有用.... 在 Pandas 0.24.2（也可能更早）中，read_excel 本身具有忽略源标题并提供您自己的 col 名称和其他几个不错的控件：

DID = pd.read_excel(file1, sheet_name=0, header=None, usecols=[0, 1, 6], names=['A', 'ID', 'B'], dtype={2:str}, skiprows=10)

# for example....
# usecols => read only specific col indexes
# dtype => specifying the data types
# skiprows => skip number of rows from the top.

Python 如何在使用 Pandas 读取 Excel 文件时指定列名？

提问by Rakesh Adhikesavan

采纳答案by falsetru

回答by Rutger Kassies

回答by ram

回答by Loku

相关推荐

最近更新

标签

Python 如何在使用 Pandas 读取 Excel 文件时指定列名？

提问by Rakesh Adhikesavan

采纳答案by falsetru

回答by Rutger Kassies

回答by ram

回答by Loku

相关推荐

Python 根据条件获取数据框行数

Python 一个块中的多个尝试代码

使用python将数据从csv复制到postgresql

Python 类型错误：'dict_keys' 对象不支持索引

相关推荐

最近更新

标签