Python 用于推断标题行的`header=True` 0.17 之前的pandas.read_csv 旧行为?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32737137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:09:31  来源:igfitidea点击:

Old pre-0.17 pandas.read_csv behavior of `header=True` for inferring header row?

pythoncsvpandasheader

提问by Roman

How did old pre-0.17 versions of pandas read_csv()interpret passing a boolean header=True/Falsefor inferring the header row?

0.17 之前的旧版熊猫如何read_csv()解释传递布尔值header=True/False以推断标题行?

I have CSV data with header:

我有带有标题的 CSV 数据:

col1;col2;col3
1.0;10.0;100.0
2.0;20.0;200.0
3.0;30.0;300.0

If read with header=True

如果阅读 header=True

i.e. df = pandas.read_csv('test.csv', sep=';', header=True),

df = pandas.read_csv('test.csv', sep=';', header=True)

that gives the following data-frame:

这给出了以下数据框:

   1.0  10.0  100.0
0    2    20    200
1    3    30    300

It means that pandas used the second row("row 1") for column names (the names inferred are '1.0', '10.0' and '100.0').

这意味着熊猫使用第二行(“第 1 行”)作为列名(推断的名称是“1.0”、“10.0”和“100.0”)。

whereas if read with header=False

而如果阅读 header=False

df = pandas.read_csv('test.csv', sep=';', header=False)

gives the following:

给出以下内容:

   col1  col2  col3
0     1    10   100
1     2    20   200
2     3    30   300

Which means that pandas used the first row ("row 0") as header in spite on the fact that I wrote explicitly that there is no header.

这意味着熊猫使用第一行(“第 0 行”)作为标题,尽管我明确写了没有标题。

This behaviour is not intuitive to me. Can somebody explain what is happening?

这种行为对我来说并不直观。有人可以解释发生了什么吗?

回答by EdChum

You are telling pandas what line is your header line, by passing Falsethis evaluates to 0which is why it reads in the first line as the header as expected, when you pass Trueit evaluates to 1so it reads the second line, if you passed Nonethen it thinks there is no header row and will auto generated ordinal values.

您告诉熊猫哪一行是您的标题行,通过将Falsethis 评估为0这就是为什么它按预期读取第一行作为标题的原因,当您传递True它时,它评估为1所以它读取第二行,如果您通过了,None那么它认为没有标题行,将自动生成序数值。

In [17]:    
import io
import pandas as pd
t="""col1;col2;col3
1.0;10.0;100.0
2.0;20.0;200.0
3.0;30.0;300.0"""
print('False:\n', pd.read_csv(io.StringIO(t), sep=';', header=False))
print('\nTrue:\n', pd.read_csv(io.StringIO(t), sep=';', header=True))
print('\nNone:\n', pd.read_csv(io.StringIO(t), sep=';', header=None))

False:
    col1  col2  col3
0     1    10   100
1     2    20   200
2     3    30   300

True:
    1.0  10.0  100.0
0    2    20    200
1    3    30    300

None:
       0     1      2
0  col1  col2   col3
1   1.0  10.0  100.0
2   2.0  20.0  200.0
3   3.0  30.0  300.0

UPDATE

更新

Since version 0.17.0this will now raise a TypeError

从版本开始,0.17.0这将引发一个TypeError