pandas pandas中`header = None`和`header = 0`的区别

Question

提问by Sarvagya Gupta

I was writing a code to read a csvfile using pandasand I saw some weird functioning of the package. My file has column names which I want to ignore, so I use header = 0or 'infer'instead of None. But I see something weird.

我正在编写一个代码来读取csv文件pandas，我看到了包的一些奇怪的功能。我的文件有我想忽略的列名，所以我使用header = 0或'infer'代替None. 但我看到了一些奇怪的东西。

When I use Noneand I want to get a specific column, I just need to do df[column_index]but when I use 0or 'infer', I need to do df.ix[:,column_index]to get the column otherwise, for df[column_index]I get the following error:

当我使用None并且我想获取特定列时，我只需要这样做，df[column_index]但是当我使用0or 时'infer'，我需要df.ix[:,column_index]否则获取列，因为df[column_index]我收到以下错误：

Traceback (most recent call last): File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: column_index
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "", line 1, in File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py", line 2139, in getitemreturn self._getitem_column(key) File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py", line 2146, in _getitem_column return self._get_item_cache(key) File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/generic.py", line 1842, in _get_item_cache values = self._data.get(item) File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/internals.py", line 3843, in get loc = self.items.get_loc(item) File "/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2527, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: column_index

回溯（最近一次通话）：文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py”，第 2525 行，在 get_loc 中返回 self. _engine.get_loc(key) 文件“pandas/_libs/index.pyx”，第 117 行，pandas._libs.index.IndexEngine.get_loc 文件“pandas/_libs/index.pyx”，第 139 行，pandas._libs.index .IndexEngine.get_loc 文件“pandas/_libs/hashtable_class_helper.pxi”，第 1265 行，pandas._libs.hashtable.PyObjectHashTable.get_item 文件“pandas/_libs/hashtable_class_helper.pxi”，第 1273 行，pandas._libs. .get_item KeyError: column_index
在处理上述异常的过程中，又发生了一个异常：
回溯（最近一次调用）：文件“”，第 1 行，在文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py”，第 2139 行, 在getitem返回 self._getitem_column(key) 文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/frame.py”，第 2146 行，在 _getitem_column 中 return self._get_item_cache(key ) 文件“/home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/generic.py”，第 1842 行，_get_item_cache values = self._data.get(item) 文件“ /home/sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/internals.py”，第 3843 行，在 get loc = self.items.get_loc(item) 文件“/home/ sarvagya/anaconda3/envs/tf/lib/python3.6/site-packages/pandas/core/indexes/base.py”，第 2527 行，在 get_loc 中返回 self._engine.get_loc(self._maybe_cast_indexer(key)) 文件“ pandas/_libs/index.pyx”，第 117 行，在 pandas._libs.index.IndexEngine.get_loc 文件中“pandas/_libs/index.pyx”，第 139 行，在 pandas._libs.index.IndexEngine.get_loc 文件“pandas/_libs/hashtable_class_helper.pxi”，第 1265 行，在 pandas._libs.hashtable.PyObjectHashTable.get_item 文件“pandas/ _libs/hashtable_class_helper.pxi”，第 1273 行，在 pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: column_index

Can someone help with this? Why is this happening?

有人可以帮忙吗？为什么会这样？

Answer 1

回答by jezrael

It looks like need 2 parameters - header=Noneand skiprows=1if want ignore original columns names for default RangeIndex.

它看起来像需要两个参数-header=None而skiprows=1如果想忽略默认原始列名RangeIndex。

Because if use only header=Nonein first row get original columns names.

因为如果仅header=None在第一行中使用，则获取原始列名。

And header=0read first row to columns names of DataFrame.

并将header=0第一行读取到DataFrame.

Sample:

样品：

import pandas as pd

temp=u"""a,b,c
1,2,3
4,5,6"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=0)
print (df)
   a  b  c
0  1  2  3
1  4  5  6

Selecting by position:

按职位选择：

print (df.iloc[:, 1])
0    2
1    5
Name: b, dtype: int64

Selecting by column name:

按列名选择：

print (df['b'])

0    2
1    5
Name: b, dtype: int64

There is no column name 1, so:

没有列名1，所以：

print (df[1]) KeyError: 1

打印 (df[1]) KeyError: 1

df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
   0  1  2
0  a  b  c
1  1  2  3
2  4  5  6

df = pd.read_csv(pd.compat.StringIO(temp), header=None, skiprows=1)
print (df)
   0  1  2
0  1  2  3
1  4  5  6

print (df[1])
0    2
1    5
Name: 1, dtype: int64

Answer 2

回答by Muffler

The difference pops up when working with a dataframe with header, so lets say your DataFrame dfhas header!

使用带有标题的数据帧时会出现差异，所以假设您的 DataFramedf有标题！

header=Nonepandas automatically assign the first row of df(which is the actual column names) to the first row, hence your columns no longer have names
header=0, pandas first deletes column names(header) and then assign new column names to them (only if you pass names = [........] while loading your file). read_csv( filepath, header = 0 , names = ['....' , '....' ...])

header=Nonepandas 自动将第一行df（即实际列名）分配给第一行，因此您的列不再有名称
header=0，pandas 首先删除列名（标题），然后为它们分配新的列名（仅当您在加载文件时传递 names = [........] 时）。 read_csv( filepath, header = 0 , names = ['....' , '....' ...])

hope it helps!

希望能帮助到你！

pandas pandas中`header = None`和`header = 0`的区别

提问by Sarvagya Gupta

回答by jezrael

回答by Muffler

相关推荐

最近更新

标签

pandas pandas中`header = None`和`header = 0`的区别

提问by Sarvagya Gupta

回答by jezrael

回答by Muffler

相关推荐

pandas python中pandas中DataFrame的dropna中的thresh

Pandas 按功能过滤数据框行

pandas python - 使用 pickle.load() 时没有名为 dill 的模块

pandas Python：将图像和数据帧写入同一个 excel 文件

相关推荐

最近更新

标签