Python:在 csv.DictReader 中跳过标有 # 的注释行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14158868/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:38:03  来源:igfitidea点击:

Python: skip comment lines marked with # in csv.DictReader

pythoncsvcomments

提问by Dan Stowell

Processing CSV files with csv.DictReaderis great - but I have CSV files with comment lines in (indicated by a hash at the start of a line), for example:

使用csv.DictReader处理 CSV 文件很棒 - 但我有带有注释行的 CSV 文件(由行首的哈希表示),例如:

# step size=1.61853
val0,val1,val2,hybridisation,temp,smattr
0.206895,0.797923,0.202077,0.631199,0.368801,0.311052,0.688948,0.597237,0.402763
-169.32,1,1.61853,2.04069e-92,1,0.000906546,0.999093,0.241356,0.758644,0.202382
# adaptation finished

The csv module doesn't include any way to skip such lines.

csv 模块不包含任何跳过此类行的方法

I could easily do something hacky, but I imagine there's a nice way to wrap a csv.DicReader around some other iterator object, which preprocesses to discard the lines.

我可以轻松地做一些 hacky 的事情,但我想有一种很好的方法可以将 csv.DicReader 包装在其他一些迭代器对象周围,该对象进行预处理以丢弃行。

采纳答案by Dan Stowell

Actually this works nicely with filter:

实际上,这很好地适用于filter

import csv
fp = open('samples.csv')
rdr = csv.DictReader(filter(lambda row: row[0]!='#', fp))
for row in rdr:
    print(row)
fp.close()

回答by sigvaldm

Good question, and a good example of how Python's CSV library lacks important functionality, such as handling basic comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that #must appear as the first symbol. A more generic solution would be:

好问题,以及 Python 的 CSV 库如何缺乏重要功能的一个很好的例子,例如处理基本注释(在 CSV 文件的顶部并不少见)。虽然 Dan Stowell 的解决方案适用于 OP 的特定情况,但它的局限性在于#必须作为第一个符号出现。更通用的解决方案是:

def decomment(csvfile):
    for row in csvfile:
        raw = row.split('#')[0].strip()
        if raw: yield raw

with open('dummy.csv') as csvfile:
    reader = csv.reader(decomment(csvfile))
    for row in reader:
        print(row)

As an example, the following dummy.csvfile:

例如,以下dummy.csv文件:

# comment
 # comment
a,b,c # comment
1,2,3
10,20,30
# comment

returns

返回

['a', 'b', 'c']
['1', '2', '3']
['10', '20', '30']

Of course, this works just as well with csv.DictReader().

当然,这也适用于csv.DictReader().

回答by Granny Aching

Another way to read a CSV file is using pandas

另一种读取 CSV 文件的方法是使用 pandas

Here's a sample code:

这是一个示例代码:

df = pd.read_csv('test.csv',
                 sep=',',     # field separator
                 comment='#', # comment
                 index_col=0, # number or label of index column
                 skipinitialspace=True,
                 skip_blank_lines=True,
                 error_bad_lines=False,
                 warn_bad_lines=True
                 ).sort_index()
print(df)
df.fillna('no value', inplace=True) # replace NaN with 'no value'
print(df)

For this csv file:

对于这个 csv 文件:

a,b,c,d,e
1,,16,,55#,,65##77
8,77,77,,16#86,18#
#This is a comment
13,19,25,28,82

we will get this output:

我们将得到这个输出:

       b   c     d   e
a                     
1    NaN  16   NaN  55
8   77.0  77   NaN  16
13  19.0  25  28.0  82
           b   c         d   e
a                             
1   no value  16  no value  55
8         77  77  no value  16
13        19  25        28  82

回答by Thibault Reuille

Just posting the bugfix from @sigvaldm's solution.

只是从@sigvaldm 的解决方案中发布错误修正。

def decomment(csvfile):
for row in csvfile:
    raw = row.split('#')[0].strip()
    if raw: yield row

with open('dummy.csv') as csvfile:
    reader = csv.reader(decomment(csvfile))
    for row in reader:
        print(row)

A CSV line can contain "#" characters in quoted strings and is perfectly valid. The previous solution was cutting off strings containing '#' characters.

CSV 行可以在带引号的字符串中包含“#”字符并且完全有效。以前的解决方案是切断包含“#”字符的字符串。