pandas 使用 pd.read_csv 时无法删除标题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34971477/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:33:44  来源:igfitidea点击:

Having trouble removing headers when using pd.read_csv

pythonpython-2.7pandas

提问by Tiberius

I have a .csv that contains contains column headers and is displayed below. I need to suppress the column labeling when I ingest the file as a data frame.

我有一个包含列标题的 .csv,显示在下面。当我将文件作为数据框摄取时,我需要取消列标签。

date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7

When I issue the following command:

当我发出以下命令时:

 df = pd.read_csv('c:/temp1/test_csv.csv', usecols=[4,5], names = ["zip","weight"], header = 0, nrows=10)

I get:

我得到:

zip               weight
0   1417464       3546600

I have tried various manipulations of header=True and header=0. If I don't use header=0, then the columns will all print out on top of the rows like so:

我尝试了 header=True 和 header=0 的各种操作。如果我不使用 header=0,那么列将全部打印在行的顶部,如下所示:

    zip           weight
    height        locale
0   1417464       3546600

I have tried skiprows= 0 and 1 but neither removes the headers. However, the command works by skipping the line specified.

我尝试过 skiprows= 0 和 1 但都没有删除标题。但是,该命令通过跳过指定的行来工作。

I could really use some additional insight or a solve. Thanks in advance for any assistance you could provide.

我真的可以使用一些额外的见解或解决方案。预先感谢您提供的任何帮助。

Tiberius

提比略

回答by jrovegno

Using the example of @jezrael, if you want to skip the header and suppress de column labeling:

使用@jezrael 的例子,如果你想跳过标题并取消列标签:

import pandas as pd
import numpy as np
import io

temp=u"""date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), usecols=[4,5], header=None, skiprows=1)
print df
         4    5
0  3546600  254

回答by π?δα? ?κ??

I'm not sure I entirely understand why you want to remove the headers, but you could comment out the header line as follows as long as you don't have any other rows that begin with 'd':

我不确定我是否完全理解您为什么要删除标题,但是只要您没有任何其他以 开头的行,您就可以如下注释掉标题行'd'

>>> df = pd.read_csv('test.csv', usecols=[3,4], header=None, comment='d')  # comments out lines beginning with 'date,color' . . .
>>> df
         3        4
0  1417464  3546600

It would be better to comment out the line in the csv file with the crosshatch character (#) and then use the same approach (again, as long as you have not commented out any other lines with a crosshatch):

最好用剖面线字符 ( #)注释掉 csv 文件中的行,然后使用相同的方法(同样,只要您没有用剖面线注释掉任何其他行):

>>> df = pd.read_csv('test.csv', usecols=[3,4], header=None, comment='#')   # comments out lines with #
>>> df
         3        4
0  1417464  3546600

回答by jezrael

I think you are right.

我想你是对的。

So you can change column names to aand b:

因此,您可以将列名更改为ab

import pandas as pd
import numpy as np
import io

temp=u"""date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), usecols=[4,5], names = ["a","b"], header = 0 , nrows=10)
print df
         a    b
0  3546600  254

Now these columns have new names instead of weightand height.

现在这些列有新名称而不是weightheight

df = pd.read_csv(io.StringIO(temp), usecols=[4,5], header = 0 , nrows=10)
print df
    weight  height
0  3546600     254

You can check docs read_csv(bold by me):

您可以查看文档read_csv(我加粗):

header: int, list of ints, default ‘infer'

Row number(s) to use as the column names, and the start of the data. Defaults to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names.The header can be a list of integers that specify row locations for a multi-index on the columns E.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example are skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

标头:int,整数列表,默认为“推断”

用作列名的行号和数据的开头。如果没有传递名称,则默认为 0,否则为 None。显式传递 header=0 以能够替换现有名称。标题可以是一个整数列表,用于指定列(例如 [0,1,3] 上的多索引的行位置)。未指定的中间行将被跳过(例如,本例中的 2 行被跳过)。请注意,如果skip_blank_lines=True,此参数将忽略注释行和空行,因此header=0 表示数据的第一行而不是文件的第一行。