pandas 使用 pd.read_csv 时无法删除标题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34971477/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Having trouble removing headers when using pd.read_csv
提问by Tiberius
I have a .csv that contains contains column headers and is displayed below. I need to suppress the column labeling when I ingest the file as a data frame.
我有一个包含列标题的 .csv,显示在下面。当我将文件作为数据框摄取时,我需要取消列标签。
date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7
When I issue the following command:
当我发出以下命令时:
df = pd.read_csv('c:/temp1/test_csv.csv', usecols=[4,5], names = ["zip","weight"], header = 0, nrows=10)
I get:
我得到:
zip weight
0 1417464 3546600
I have tried various manipulations of header=True and header=0. If I don't use header=0, then the columns will all print out on top of the rows like so:
我尝试了 header=True 和 header=0 的各种操作。如果我不使用 header=0,那么列将全部打印在行的顶部,如下所示:
zip weight
height locale
0 1417464 3546600
I have tried skiprows= 0 and 1 but neither removes the headers. However, the command works by skipping the line specified.
我尝试过 skiprows= 0 和 1 但都没有删除标题。但是,该命令通过跳过指定的行来工作。
I could really use some additional insight or a solve. Thanks in advance for any assistance you could provide.
我真的可以使用一些额外的见解或解决方案。预先感谢您提供的任何帮助。
Tiberius
提比略
回答by jrovegno
Using the example of @jezrael, if you want to skip the header and suppress de column labeling:
使用@jezrael 的例子,如果你想跳过标题并取消列标签:
import pandas as pd
import numpy as np
import io
temp=u"""date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), usecols=[4,5], header=None, skiprows=1)
print df
4 5
0 3546600 254
回答by π?δα? ?κ??
I'm not sure I entirely understand why you want to remove the headers, but you could comment out the header line as follows as long as you don't have any other rows that begin with 'd'
:
我不确定我是否完全理解您为什么要删除标题,但是只要您没有任何其他以 开头的行,您就可以如下注释掉标题行'd'
:
>>> df = pd.read_csv('test.csv', usecols=[3,4], header=None, comment='d') # comments out lines beginning with 'date,color' . . .
>>> df
3 4
0 1417464 3546600
It would be better to comment out the line in the csv file with the crosshatch character (#
) and then use the same approach (again, as long as you have not commented out any other lines with a crosshatch):
最好用剖面线字符 ( #
)注释掉 csv 文件中的行,然后使用相同的方法(同样,只要您没有用剖面线注释掉任何其他行):
>>> df = pd.read_csv('test.csv', usecols=[3,4], header=None, comment='#') # comments out lines with #
>>> df
3 4
0 1417464 3546600
回答by jezrael
I think you are right.
我想你是对的。
So you can change column names to a
and b
:
因此,您可以将列名更改为a
和b
:
import pandas as pd
import numpy as np
import io
temp=u"""date,color,id,zip,weight,height,locale
11/25/2013,Blue,122468,1417464,3546600,254,7"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), usecols=[4,5], names = ["a","b"], header = 0 , nrows=10)
print df
a b
0 3546600 254
Now these columns have new names instead of weight
and height
.
现在这些列有新名称而不是weight
和height
。
df = pd.read_csv(io.StringIO(temp), usecols=[4,5], header = 0 , nrows=10)
print df
weight height
0 3546600 254
You can check docs read_csv(bold by me):
您可以查看文档read_csv(我加粗):
header: int, list of ints, default ‘infer'
Row number(s) to use as the column names, and the start of the data. Defaults to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names.The header can be a list of integers that specify row locations for a multi-index on the columns E.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example are skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
标头:int,整数列表,默认为“推断”
用作列名的行号和数据的开头。如果没有传递名称,则默认为 0,否则为 None。显式传递 header=0 以能够替换现有名称。标题可以是一个整数列表,用于指定列(例如 [0,1,3] 上的多索引的行位置)。未指定的中间行将被跳过(例如,本例中的 2 行被跳过)。请注意,如果skip_blank_lines=True,此参数将忽略注释行和空行,因此header=0 表示数据的第一行而不是文件的第一行。