Python 如何在使用 Pandas 读取 Excel 文件时指定列名?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17335649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to specify column names while reading an Excel file using Pandas?
提问by Rakesh Adhikesavan
I read an Excel Sheet into a pandas DataFrame this way:
我通过这种方式将 Excel 工作表读入了熊猫 DataFrame:
import pandas as pd
xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet1")
the first cell's value of each column is selected as the column name for the dataFrame, I want to specify my own column names, How do I do this?
每列的第一个单元格的值被选为数据框的列名,我想指定我自己的列名,我该怎么做?
采纳答案by falsetru
call .parse
with header=None
keyword argument.
打电话.parse
与header=None
关键字参数。
df = xl.parse("Sheet1", header=None)
回答by Rutger Kassies
I think setting them afterwards is the only way in this case, so if you have for example four columns in your DataFrame:
我认为在这种情况下之后设置它们是唯一的方法,所以如果你的 DataFrame 中有四列:
df.columns = ['W','X','Y','Z']
If you know in advance what the headers in the Excelfile are its probably better to rename them, this would rename W into A, etc:
如果您事先知道 Excelfile 中的标题是什么,重命名它们可能更好,这会将 W 重命名为 A 等:
df.rename(columns={'W':'A', 'X':'B', etc})
回答by ram
This thread is 5 years old and outdated now, but still shows up on the top of the list from a generic search. So I am adding this note. Pandas now (v0.22) has a keyword to specify column names at parsing Excel files. Use:
该线程已有 5 年历史,现已过时,但仍显示在一般搜索列表的顶部。所以我添加了这个注释。Pandas 现在(v0.22)有一个关键字来在解析 Excel 文件时指定列名。用:
import pandas as pd
xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet 1", header=None, names=['A', 'B', 'C'])
If header=None is not set, pd seems to consider the first row as header and delete it during parsing. If there is indeed a header, but you dont want to use it, you have two choices, either (1) use "names" kwarg only; or (2) use "names" with header=None and skiprows=1. I personally prefer the second option, since it clearly makes note that the input file is not in the format I want, and that I am doing something to go around it.
如果 header=None 没有设置,pd 似乎将第一行视为标题并在解析过程中将其删除。如果确实有标题,但您不想使用它,则有两种选择,要么(1)仅使用“名称”kwarg;或 (2) 使用 header=None 和 skiprows=1 的“名称”。我个人更喜欢第二个选项,因为它清楚地表明输入文件不是我想要的格式,并且我正在做一些事情来解决它。
回答by Loku
As Ram said, this post comes on the top and may be useful to some.... In pandas 0.24.2 (may be earlier as well), read_excel itself has the capability of ignoring the source headers and giving your own col names and few other good controls:
正如 Ram 所说,这篇文章排在最前面,可能对某些人有用.... 在 Pandas 0.24.2(也可能更早)中,read_excel 本身具有忽略源标题并提供您自己的 col 名称和其他几个不错的控件:
DID = pd.read_excel(file1, sheet_name=0, header=None, usecols=[0, 1, 6], names=['A', 'ID', 'B'], dtype={2:str}, skiprows=10)
# for example....
# usecols => read only specific col indexes
# dtype => specifying the data types
# skiprows => skip number of rows from the top.