pandas pandas中`header = None`和`header = 0`的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51759122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
difference between `header = None` and `header = 0` in pandas
提问by Sarvagya Gupta
I was writing a code to read a csv
file using pandas
and I saw some weird functioning of the package. My file has column names which I want to ignore, so I use header = 0
or 'infer'
instead of None
. But I see something weird.
我正在编写一个代码来读取csv
文件pandas
,我看到了包的一些奇怪的功能。我的文件有我想忽略的列名,所以我使用header = 0
或'infer'
代替None
. 但我看到了一些奇怪的东西。
When I use None
and I want to get a specific column, I just need to do df[column_index]
but when I use 0
or 'infer'
, I need to do df.ix[:,column_index]
to get the column otherwise, for df[column_index]
I get the following error:
当我使用None
并且我想获取特定列时,我只需要这样做,df[column_index]
但是当我使用0
or 时'infer'
,我需要df.ix[:,column_index]
否则获取列,因为df[column_index]
我收到以下错误:
Traceback (most recent call last): File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: column_index
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py", line 2139, in getitemreturn self._getitem_column(key) File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py", line 2146, in _getitem_column return self._get_item_cache(key) File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache values = self._data.get(item) File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/internals.py", line 3843, in get loc = self.items.get_loc(item) File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2527, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: column_index
回溯(最近一次通话):文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py”,第 2525 行,在 get_loc 中返回 self. _engine.get_loc(key) 文件“pandas/_libs/index.pyx”,第 117 行,pandas._libs.index.IndexEngine.get_loc 文件“pandas/_libs/index.pyx”,第 139 行,pandas._libs.index .IndexEngine.get_loc 文件“pandas/_libs/hashtable_class_helper.pxi”,第 1265 行,pandas._libs.hashtable.PyObjectHashTable.get_item 文件“pandas/_libs/hashtable_class_helper.pxi”,第 1273 行,pandas._libs. .get_item KeyError: column_index
在处理上述异常的过程中,又发生了一个异常:
回溯(最近一次调用):文件“”,第 1 行,在文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py”,第 2139 行, 在getitem返回 self._getitem_column(key) 文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py”,第 2146 行,在 _getitem_column 中 return self._get_item_cache(key ) 文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/generic.py”,第 1842 行,_get_item_cache values = self._data.get(item) 文件“ /home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/internals.py”,第 3843 行,在 get loc = self.items.get_loc(item) 文件“/home/ sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py”,第 2527 行,在 get_loc 中返回 self._engine.get_loc(self._maybe_cast_indexer(key)) 文件“ pandas/_libs/index.pyx”,第 117 行,在 pandas._libs.index.IndexEngine.get_loc 文件中“pandas/_libs/index.pyx”,第 139 行,在 pandas._libs.index.IndexEngine.get_loc 文件“pandas/_libs/hashtable_class_helper.pxi”,第 1265 行,在 pandas._libs.hashtable.PyObjectHashTable.get_item 文件“pandas/ _libs/hashtable_class_helper.pxi”,第 1273 行,在 pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: column_index
Can someone help with this? Why is this happening?
有人可以帮忙吗?为什么会这样?
回答by jezrael
It looks like need 2 parameters - header=None
and skiprows=1
if want ignore original columns names for default RangeIndex
.
它看起来像需要两个参数-header=None
而skiprows=1
如果想忽略默认原始列名RangeIndex
。
Because if use only header=None
in first row get original columns names.
因为如果仅header=None
在第一行中使用,则获取原始列名。
And header=0
read first row to columns names of DataFrame
.
并将header=0
第一行读取到DataFrame
.
Sample:
样品:
import pandas as pd
temp=u"""a,b,c
1,2,3
4,5,6"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=0)
print (df)
a b c
0 1 2 3
1 4 5 6
Selecting by position:
按职位选择:
print (df.iloc[:, 1])
0 2
1 5
Name: b, dtype: int64
Selecting by column name:
按列名选择:
print (df['b'])
0 2
1 5
Name: b, dtype: int64
There is no column name 1
, so:
没有列名1
,所以:
print (df[1]) KeyError: 1
打印 (df[1]) KeyError: 1
df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
0 1 2
0 a b c
1 1 2 3
2 4 5 6
df = pd.read_csv(pd.compat.StringIO(temp), header=None, skiprows=1)
print (df)
0 1 2
0 1 2 3
1 4 5 6
print (df[1])
0 2
1 5
Name: 1, dtype: int64
回答by Muffler
The difference pops up when working with a dataframe with header, so lets say your DataFrame df
has header!
使用带有标题的数据帧时会出现差异,所以假设您的 DataFramedf
有标题!
header=None
pandas automatically assign the first row ofdf
(which is the actual column names) to the first row, hence your columns no longer have namesheader=0
, pandas first deletes column names(header) and then assign new column names to them (only if you pass names = [........] while loading your file).read_csv( filepath, header = 0 , names = ['....' , '....' ...])
header=None
pandas 自动将第一行df
(即实际列名)分配给第一行,因此您的列不再有名称header=0
,pandas 首先删除列名(标题),然后为它们分配新的列名(仅当您在加载文件时传递 names = [........] 时)。read_csv( filepath, header = 0 , names = ['....' , '....' ...])
hope it helps!
希望能帮助到你!