Python pandas.read_csv：如何跳过评论行

Question

提问by mathtick

I think I misunderstand the intention of read_csv. If I have a file 'j' like

我想我误解了 read_csv 的意图。如果我有一个像“j”这样的文件

# notes
a,b,c
# more notes
1,2,3

How can I pandas.read_csv this file, skipping any '#' commented lines? I see in the help 'comment' of lines is not supported but it indicates an empty line should be returned. I see an error

我怎样才能 pandas.read_csv 这个文件，跳过任何“#”注释行？我在帮助中看到不支持行的“注释”，但它表示应该返回一个空行。我看到一个错误

df = pandas.read_csv('j', comment='#')

CParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 3

CParserError：标记数据时出错。C 错误：第 2 行中应有 1 个字段，看到 3 个

I'm currently on

我目前在

In [15]: pandas.__version__
Out[15]: '0.12.0rc1'

On version'0.12.0-199-g4c8ad82':

在版本“0.12.0-199-g4c8ad82”上：

In [43]: df = pandas.read_csv('j', comment='#', header=None)

CParserError: Error tokenizing data. C error: Expected 1 fields in line 2, saw 3

CParserError：标记数据时出错。C 错误：第 2 行中应有 1 个字段，看到 3 个

Answer 1

采纳答案by hlin117

So I believe in the latest releases of pandas (version 0.16.0), you could throw in the comment='#'parameter into pd.read_csvand this should skip commented out lines.

所以我相信最新版本的 Pandas（0.16.0 版），您可以将comment='#'参数放入其中pd.read_csv，这应该跳过注释掉的行。

These github issues shows that you can do this:

这些 github 问题表明您可以这样做：

See the documentation on read_csv: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

请参阅以下文档read_csv：http: //pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

Answer 2

回答by Andy Hayden

One workaround is to specify skiprows to ignore the first few entries:

一种解决方法是指定 skiprows 以忽略前几个条目：

In [11]: s = '# notes\na,b,c\n# more notes\n1,2,3'

In [12]: pd.read_csv(StringIO(s), sep=',', comment='#', skiprows=1)
Out[12]: 
    a   b   c
0 NaN NaN NaN
1   1   2   3

Otherwise read_csvgets a little confused:

否则read_csv会有点困惑：

In [13]: pd.read_csv(StringIO(s), sep=',', comment='#')
Out[13]: 
        Unnamed: 0
a   b            c
NaN NaN        NaN
1   2            3

This seems to be the case in 0.12.0, I've filed a bug report.

这似乎是 0.12.0 的情况，我已经提交了一个错误报告。

As Viktor points out you can use dropna to remove the NaN after the fact... (there is a recent open issueto have commented lines be ignored completely):

正如 Viktor 指出的那样，您可以在事后使用 dropna 删除 NaN ......（最近有一个未解决的问题可以完全忽略注释行）：

In [14]: pd.read_csv(StringIO(s2), comment='#', sep=',').dropna(how='all')
Out[14]: 
   a  b  c
1  1  2  3

Note: the default index will "give away" the fact there was missing data.

注意：默认索引会“泄露”缺少数据的事实。

Answer 3

回答by Finn ?rup Nielsen

I am on Pandas version 0.13.1 and this comments-in-csvproblem still bothers me.

我使用的是 Pandas 0.13.1 版，这个csv 注释问题仍然困扰着我。

Here is my present workaround:

这是我目前的解决方法：

def read_csv(filename, comment='#', sep=','):
    lines = "".join([line for line in open(filename) 
                     if not line.startswith(comment)])
    return pd.read_csv(StringIO(lines), sep=sep)

Otherwise with pd.read_csv(filename, comment='#')I get

否则pd.read_csv(filename, comment='#')我得到

pandas.parser.CParserError: Error tokenizing data. C error: Expected 1 fields in line 16, saw 3.

pandas.parser.CParserError：标记数据时出错。C 错误：第 16 行应为 1 个字段，看到 3 个。

Python pandas.read_csv：如何跳过评论行

提问by mathtick

采纳答案by hlin117

回答by Andy Hayden

回答by Finn ?rup Nielsen

相关推荐

最近更新

标签

Python pandas.read_csv：如何跳过评论行

提问by mathtick

采纳答案by hlin117

回答by Andy Hayden

回答by Finn ?rup Nielsen

相关推荐

在 python 中读取 .DAT 文件？

具有生存时间的 Python 内存缓存

Python: subprocess.call, stdout to file, stderr to file, 在屏幕上实时显示stderr

为 python 3.4 安装机械化

相关推荐

最近更新

标签