Python numpy loadtxt 跳过第一行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17151210/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:37:16  来源:igfitidea点击:

numpy loadtxt skip first row

pythonbashcsvnumpyimport-from-csv

提问by astromax

I have a small issue when I'm trying to import data from CSV files with numpy's loadtxt function. Here's a sample of the type of data files I have.

当我尝试使用 numpy 的 loadtxt 函数从 CSV 文件导入数据时遇到一个小问题。这是我拥有的数据文件类型的示例。

Call it 'datafile1.csv':

称之为“datafile1.csv”:

# Comment 1
# Comment 2
x,y,z 
1,2,3
4,5,6
7,8,9
...
...
# End of File Comment

The script that I thought would work for this situation looks like:

我认为适用于这种情况的脚本如下所示:

import numpy as np
FH = np.loadtxt('datafile1.csv',comments='#',delimiter=',',skiprows=1)

But, I'm getting an error:

但是,我收到一个错误:

ValueError: could not convert string to float: x

This tells me that the kwarg 'skiprows' is not skipping the header, it's skipping the first row of comments. I could simply make sure that skiprows=3, but the complication is that I have a very large number of files, which don't all necessarily have the same number of commented lines at the top of the file. How can I make sure that when I use loadtxt I'm only getting the actual data in a situation like this?

这告诉我 kwarg 'skiprows' 没有跳过标题,而是跳过第一行评论。我可以简单地确保skiprows=3,但复杂的是我有大量的文件,这些文件的顶部不一定有相同数量的注释行。如何确保在使用 loadtxt 时仅在这种情况下获取实际数据?

P.S. - I'm open to bash solutions, too.

PS - 我也愿意接受 bash 解决方案。

采纳答案by falsetru

Skip comment line manually using generator expression:

使用生成器表达式手动跳过注释行:

import numpy as np

with open('datafile1.csv') as f:
    lines = (line for line in f if not line.startswith('#'))
    FH = np.loadtxt(lines, delimiter=',', skiprows=1)

回答by Jon Clements

Create your own custom filter function, such as:

创建您自己的自定义过滤器功能,例如:

def skipper(fname):
    with open(fname) as fin:
        no_comments = (line for line in fin if not line.lstrip().startswith('#'))
        next(no_comments, None) # skip header
        for row in no_comments:
            yield row

a = np.loadtxt(skipper('your_file'), delimiter=',')

回答by JeffZheng

def skipper(fname, header=False):
    with open(fname) as fin:
        no_comments = (line for line in fin if not line.lstrip().startswith('#'))
        if header:
            next(no_comments, None) # skip header
        for row in no_comments:
            yield row

a = np.loadtxt(skipper('your_file'), delimiter=',')

This is just a little modification of @Jon Clements's answer by adding an optional parameter "header", given that in some cases, the csv file has comment lines (starts with #) but doesn't have the header row.

考虑到在某些情况下,csv 文件具有注释行(以 # 开头)但没有标题行,这只是对 @Jon Clements 的答案的一个小小的修改,添加了一个可选参数“header”。