Python 防止 pandas read_csv 将第一行视为列名的标题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40769691/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Prevent pandas read_csv treating first row as header of column names
提问by R.M.
I'm reading in a pandas DataFrame
using pd.read_csv
. I want to keep the first row as data, however it keeps getting converted to column names.
我正在阅读pandas DataFrame
using pd.read_csv
。我想将第一行保留为数据,但它不断转换为列名。
- I tried
header=False
but this just deleted it entirely.
- 我试过了,
header=False
但这只是完全删除了它。
(Note on my input data: I have a string (st = '\n'.join(lst)
) that I convert to a file-like object (io.StringIO(st)
), then build the csv
from that file object.)
(注意我的输入数据:我有一个字符串 ( st = '\n'.join(lst)
),我将它转换为类似文件的对象 ( io.StringIO(st)
),然后csv
从该文件对象构建。)
回答by EdChum
You want header=None
the False
gets type promoted to int
into 0
see the docsemphasis mine:
您希望header=None
将False
get 类型提升int
为0
查看我的文档重点:
header : int or list of ints, default ‘infer' Row number(s) to use as the column names, and the start of the data. Default behavior is as if set to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
header : int 或 int 列表,默认“推断”行号用作列名,以及数据的开头。如果没有传递名称,默认行为就像设置为 0 一样,否则设置为None。显式传递 header=0 以能够替换现有名称。标题可以是一个整数列表,用于指定列上多索引的行位置,例如 [0,1,3]。未指定的中间行将被跳过(例如,本例中的 2 被跳过)。请注意,如果skip_blank_lines=True,此参数将忽略注释行和空行,因此header=0 表示数据的第一行而不是文件的第一行。
You can see the difference in behaviour, first with header=0
:
您可以看到行为的差异,首先是header=0
:
In [95]:
import io
import pandas as pd
t="""a,b,c
0,1,2
3,4,5"""
pd.read_csv(io.StringIO(t), header=0)
Out[95]:
a b c
0 0 1 2
1 3 4 5
Now with None
:
现在None
:
In [96]:
pd.read_csv(io.StringIO(t), header=None)
Out[96]:
0 1 2
0 a b c
1 0 1 2
2 3 4 5
Note that in latest version 0.19.1
, this will now raise a TypeError
:
请注意,在最新版本中0.19.1
,这将引发TypeError
:
In [98]:
pd.read_csv(io.StringIO(t), header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no header or header=int or list-like of ints to specify the row(s) making up the column names
类型错误:将布尔值传递给标头无效。使用 header=None 表示没有标题或 header=int 或类似 int 的列表来指定构成列名称的行