Python：在 csv.DictReader 中跳过标有 # 的注释行

Question

提问by Dan Stowell

Processing CSV files with csv.DictReaderis great - but I have CSV files with comment lines in (indicated by a hash at the start of a line), for example:

使用csv.DictReader处理 CSV 文件很棒 - 但我有带有注释行的 CSV 文件（由行首的哈希表示），例如：

# step size=1.61853
val0,val1,val2,hybridisation,temp,smattr
0.206895,0.797923,0.202077,0.631199,0.368801,0.311052,0.688948,0.597237,0.402763
-169.32,1,1.61853,2.04069e-92,1,0.000906546,0.999093,0.241356,0.758644,0.202382
# adaptation finished

The csv module doesn't include any way to skip such lines.

csv 模块不包含任何跳过此类行的方法。

I could easily do something hacky, but I imagine there's a nice way to wrap a csv.DicReader around some other iterator object, which preprocesses to discard the lines.

我可以轻松地做一些 hacky 的事情，但我想有一种很好的方法可以将 csv.DicReader 包装在其他一些迭代器对象周围，该对象进行预处理以丢弃行。

Answer 1

采纳答案by Dan Stowell

Actually this works nicely with filter:

实际上，这很好地适用于filter：

import csv
fp = open('samples.csv')
rdr = csv.DictReader(filter(lambda row: row[0]!='#', fp))
for row in rdr:
    print(row)
fp.close()

Answer 2

回答by sigvaldm

Good question, and a good example of how Python's CSV library lacks important functionality, such as handling basic comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that #must appear as the first symbol. A more generic solution would be:

好问题，以及 Python 的 CSV 库如何缺乏重要功能的一个很好的例子，例如处理基本注释（在 CSV 文件的顶部并不少见）。虽然 Dan Stowell 的解决方案适用于 OP 的特定情况，但它的局限性在于#必须作为第一个符号出现。更通用的解决方案是：

def decomment(csvfile):
    for row in csvfile:
        raw = row.split('#')[0].strip()
        if raw: yield raw

with open('dummy.csv') as csvfile:
    reader = csv.reader(decomment(csvfile))
    for row in reader:
        print(row)

As an example, the following dummy.csvfile:

例如，以下dummy.csv文件：

# comment
 # comment
a,b,c # comment
1,2,3
10,20,30
# comment

returns

返回

['a', 'b', 'c']
['1', '2', '3']
['10', '20', '30']

Of course, this works just as well with csv.DictReader().

当然，这也适用于csv.DictReader().

Answer 3

回答by Granny Aching

Another way to read a CSV file is using pandas

另一种读取 CSV 文件的方法是使用 pandas

Here's a sample code:

这是一个示例代码：

df = pd.read_csv('test.csv',
                 sep=',',     # field separator
                 comment='#', # comment
                 index_col=0, # number or label of index column
                 skipinitialspace=True,
                 skip_blank_lines=True,
                 error_bad_lines=False,
                 warn_bad_lines=True
                 ).sort_index()
print(df)
df.fillna('no value', inplace=True) # replace NaN with 'no value'
print(df)

For this csv file:

对于这个 csv 文件：

a,b,c,d,e
1,,16,,55#,,65##77
8,77,77,,16#86,18#
#This is a comment
13,19,25,28,82

we will get this output:

我们将得到这个输出：

       b   c     d   e
a                     
1    NaN  16   NaN  55
8   77.0  77   NaN  16
13  19.0  25  28.0  82
           b   c         d   e
a                             
1   no value  16  no value  55
8         77  77  no value  16
13        19  25        28  82

Answer 4

回答by Thibault Reuille

Just posting the bugfix from @sigvaldm's solution.

只是从@sigvaldm 的解决方案中发布错误修正。

def decomment(csvfile):
for row in csvfile:
    raw = row.split('#')[0].strip()
    if raw: yield row

with open('dummy.csv') as csvfile:
    reader = csv.reader(decomment(csvfile))
    for row in reader:
        print(row)

A CSV line can contain "#" characters in quoted strings and is perfectly valid. The previous solution was cutting off strings containing '#' characters.

CSV 行可以在带引号的字符串中包含“#”字符并且完全有效。以前的解决方案是切断包含“#”字符的字符串。

Python：在 csv.DictReader 中跳过标有 # 的注释行

提问by Dan Stowell

采纳答案by Dan Stowell

回答by sigvaldm

回答by Granny Aching

回答by Thibault Reuille

相关推荐

最近更新

标签

Python：在 csv.DictReader 中跳过标有 # 的注释行

提问by Dan Stowell

采纳答案by Dan Stowell

回答by sigvaldm

回答by Granny Aching

回答by Thibault Reuille

相关推荐

Python 如何强制 Django 忽略任何缓存并重新加载数据？

使用 python 的 Euler 项目 #3 - 最有效的方法

用 Python 换行

如何在使用 Python 的 smtplib 发送的电子邮件中获得换行符？

相关推荐

最近更新

标签