Python 如何在使用 Pandas 读取 Excel 文件时指定列名?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17335649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 07:57:23  来源:igfitidea点击:

How to specify column names while reading an Excel file using Pandas?

pythonpandas

提问by Rakesh Adhikesavan

I read an Excel Sheet into a pandas DataFrame this way:

我通过这种方式将 Excel 工作表读入了熊猫 DataFrame:

import pandas as pd

xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet1")

the first cell's value of each column is selected as the column name for the dataFrame, I want to specify my own column names, How do I do this?

每列的第一个单元格的值被选为数据框的列名,我想指定我自己的列名,我该怎么做?

采纳答案by falsetru

call .parsewith header=Nonekeyword argument.

打电话.parseheader=None关键字参数。

df = xl.parse("Sheet1", header=None)

回答by Rutger Kassies

I think setting them afterwards is the only way in this case, so if you have for example four columns in your DataFrame:

我认为在这种情况下之后设置它们是唯一的方法,所以如果你的 DataFrame 中有四列:

df.columns = ['W','X','Y','Z']

If you know in advance what the headers in the Excelfile are its probably better to rename them, this would rename W into A, etc:

如果您事先知道 Excelfile 中的标题是什么,重命名它们可能更好,这会将 W 重命名为 A 等:

df.rename(columns={'W':'A', 'X':'B', etc})

回答by ram

This thread is 5 years old and outdated now, but still shows up on the top of the list from a generic search. So I am adding this note. Pandas now (v0.22) has a keyword to specify column names at parsing Excel files. Use:

该线程已有 5 年历史,现已过时,但仍显示在一般搜索列表的顶部。所以我添加了这个注释。Pandas 现在(v0.22)有一个关键字来在解析 Excel 文件时指定列名。用:

import pandas as pd
xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet 1", header=None, names=['A', 'B', 'C'])

If header=None is not set, pd seems to consider the first row as header and delete it during parsing. If there is indeed a header, but you dont want to use it, you have two choices, either (1) use "names" kwarg only; or (2) use "names" with header=None and skiprows=1. I personally prefer the second option, since it clearly makes note that the input file is not in the format I want, and that I am doing something to go around it.

如果 header=None 没有设置,pd 似乎将第一行视为标题并在解析过程中将其删除。如果确实有标题,但您不想使用它,则有两种选择,要么(1)仅使用“名称”kwarg;或 (2) 使用 header=None 和 skiprows=1 的“名称”。我个人更喜欢第二个选项,因为它清楚地表明输入文件不是我想要的格式,并且我正在做一些事情来解决它。

回答by Loku

As Ram said, this post comes on the top and may be useful to some.... In pandas 0.24.2 (may be earlier as well), read_excel itself has the capability of ignoring the source headers and giving your own col names and few other good controls:

正如 Ram 所说,这篇文章排在最前面,可能对某些人有用.... 在 Pandas 0.24.2(也可能更早)中,read_excel 本身具有忽略源标题并提供您自己的 col 名称和其他几个不错的控件:

DID = pd.read_excel(file1, sheet_name=0, header=None, usecols=[0, 1, 6], names=['A', 'ID', 'B'], dtype={2:str}, skiprows=10)

# for example....
# usecols => read only specific col indexes
# dtype => specifying the data types
# skiprows => skip number of rows from the top.