Python 用于推断标题行的`header=True` 0.17 之前的pandas.read_csv 旧行为?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32737137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Old pre-0.17 pandas.read_csv behavior of `header=True` for inferring header row?
提问by Roman
How did old pre-0.17 versions of pandas read_csv()
interpret passing a boolean header=True
/False
for inferring the header row?
0.17 之前的旧版熊猫如何read_csv()
解释传递布尔值header=True
/False
以推断标题行?
I have CSV data with header:
我有带有标题的 CSV 数据:
col1;col2;col3
1.0;10.0;100.0
2.0;20.0;200.0
3.0;30.0;300.0
If read with header=True
如果阅读 header=True
i.e. df = pandas.read_csv('test.csv', sep=';', header=True)
,
即df = pandas.read_csv('test.csv', sep=';', header=True)
,
that gives the following data-frame:
这给出了以下数据框:
1.0 10.0 100.0
0 2 20 200
1 3 30 300
It means that pandas used the second row("row 1") for column names (the names inferred are '1.0', '10.0' and '100.0').
这意味着熊猫使用第二行(“第 1 行”)作为列名(推断的名称是“1.0”、“10.0”和“100.0”)。
whereas if read with header=False
而如果阅读 header=False
df = pandas.read_csv('test.csv', sep=';', header=False)
gives the following:
给出以下内容:
col1 col2 col3
0 1 10 100
1 2 20 200
2 3 30 300
Which means that pandas used the first row ("row 0") as header in spite on the fact that I wrote explicitly that there is no header.
这意味着熊猫使用第一行(“第 0 行”)作为标题,尽管我明确写了没有标题。
This behaviour is not intuitive to me. Can somebody explain what is happening?
这种行为对我来说并不直观。有人可以解释发生了什么吗?
回答by EdChum
You are telling pandas what line is your header line, by passing False
this evaluates to 0
which is why it reads in the first line as the header as expected, when you pass True
it evaluates to 1
so it reads the second line, if you passed None
then it thinks there is no header row and will auto generated ordinal values.
您告诉熊猫哪一行是您的标题行,通过将False
this 评估为0
这就是为什么它按预期读取第一行作为标题的原因,当您传递True
它时,它评估为1
所以它读取第二行,如果您通过了,None
那么它认为没有标题行,将自动生成序数值。
In [17]:
import io
import pandas as pd
t="""col1;col2;col3
1.0;10.0;100.0
2.0;20.0;200.0
3.0;30.0;300.0"""
print('False:\n', pd.read_csv(io.StringIO(t), sep=';', header=False))
print('\nTrue:\n', pd.read_csv(io.StringIO(t), sep=';', header=True))
print('\nNone:\n', pd.read_csv(io.StringIO(t), sep=';', header=None))
False:
col1 col2 col3
0 1 10 100
1 2 20 200
2 3 30 300
True:
1.0 10.0 100.0
0 2 20 200
1 3 30 300
None:
0 1 2
0 col1 col2 col3
1 1.0 10.0 100.0
2 2.0 20.0 200.0
3 3.0 30.0 300.0
UPDATE
更新
Since version 0.17.0
this will now raise a TypeError
从版本开始,0.17.0
这将引发一个TypeError