Python:在 csv.DictReader 中跳过标有 # 的注释行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14158868/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: skip comment lines marked with # in csv.DictReader
提问by Dan Stowell
Processing CSV files with csv.DictReaderis great - but I have CSV files with comment lines in (indicated by a hash at the start of a line), for example:
使用csv.DictReader处理 CSV 文件很棒 - 但我有带有注释行的 CSV 文件(由行首的哈希表示),例如:
# step size=1.61853 val0,val1,val2,hybridisation,temp,smattr 0.206895,0.797923,0.202077,0.631199,0.368801,0.311052,0.688948,0.597237,0.402763 -169.32,1,1.61853,2.04069e-92,1,0.000906546,0.999093,0.241356,0.758644,0.202382 # adaptation finished
The csv module doesn't include any way to skip such lines.
csv 模块不包含任何跳过此类行的方法。
I could easily do something hacky, but I imagine there's a nice way to wrap a csv.DicReader around some other iterator object, which preprocesses to discard the lines.
我可以轻松地做一些 hacky 的事情,但我想有一种很好的方法可以将 csv.DicReader 包装在其他一些迭代器对象周围,该对象进行预处理以丢弃行。
采纳答案by Dan Stowell
Actually this works nicely with filter:
实际上,这很好地适用于filter:
import csv
fp = open('samples.csv')
rdr = csv.DictReader(filter(lambda row: row[0]!='#', fp))
for row in rdr:
print(row)
fp.close()
回答by sigvaldm
Good question, and a good example of how Python's CSV library lacks important functionality, such as handling basic comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that #must appear as the first symbol. A more generic solution would be:
好问题,以及 Python 的 CSV 库如何缺乏重要功能的一个很好的例子,例如处理基本注释(在 CSV 文件的顶部并不少见)。虽然 Dan Stowell 的解决方案适用于 OP 的特定情况,但它的局限性在于#必须作为第一个符号出现。更通用的解决方案是:
def decomment(csvfile):
for row in csvfile:
raw = row.split('#')[0].strip()
if raw: yield raw
with open('dummy.csv') as csvfile:
reader = csv.reader(decomment(csvfile))
for row in reader:
print(row)
As an example, the following dummy.csvfile:
例如,以下dummy.csv文件:
# comment
# comment
a,b,c # comment
1,2,3
10,20,30
# comment
returns
返回
['a', 'b', 'c']
['1', '2', '3']
['10', '20', '30']
Of course, this works just as well with csv.DictReader().
当然,这也适用于csv.DictReader().
回答by Granny Aching
Another way to read a CSV file is using pandas
另一种读取 CSV 文件的方法是使用 pandas
Here's a sample code:
这是一个示例代码:
df = pd.read_csv('test.csv',
sep=',', # field separator
comment='#', # comment
index_col=0, # number or label of index column
skipinitialspace=True,
skip_blank_lines=True,
error_bad_lines=False,
warn_bad_lines=True
).sort_index()
print(df)
df.fillna('no value', inplace=True) # replace NaN with 'no value'
print(df)
For this csv file:
对于这个 csv 文件:
a,b,c,d,e
1,,16,,55#,,65##77
8,77,77,,16#86,18#
#This is a comment
13,19,25,28,82
we will get this output:
我们将得到这个输出:
b c d e
a
1 NaN 16 NaN 55
8 77.0 77 NaN 16
13 19.0 25 28.0 82
b c d e
a
1 no value 16 no value 55
8 77 77 no value 16
13 19 25 28 82
回答by Thibault Reuille
Just posting the bugfix from @sigvaldm's solution.
只是从@sigvaldm 的解决方案中发布错误修正。
def decomment(csvfile):
for row in csvfile:
raw = row.split('#')[0].strip()
if raw: yield row
with open('dummy.csv') as csvfile:
reader = csv.reader(decomment(csvfile))
for row in reader:
print(row)
A CSV line can contain "#" characters in quoted strings and is perfectly valid. The previous solution was cutting off strings containing '#' characters.
CSV 行可以在带引号的字符串中包含“#”字符并且完全有效。以前的解决方案是切断包含“#”字符的字符串。

