Python numpy loadtxt 跳过第一行

Question

提问by astromax

I have a small issue when I'm trying to import data from CSV files with numpy's loadtxt function. Here's a sample of the type of data files I have.

当我尝试使用 numpy 的 loadtxt 函数从 CSV 文件导入数据时遇到一个小问题。这是我拥有的数据文件类型的示例。

Call it 'datafile1.csv':

称之为“datafile1.csv”：

# Comment 1
# Comment 2
x,y,z 
1,2,3
4,5,6
7,8,9
...
...
# End of File Comment

The script that I thought would work for this situation looks like:

我认为适用于这种情况的脚本如下所示：

import numpy as np
FH = np.loadtxt('datafile1.csv',comments='#',delimiter=',',skiprows=1)

But, I'm getting an error:

但是，我收到一个错误：

ValueError: could not convert string to float: x

This tells me that the kwarg 'skiprows' is not skipping the header, it's skipping the first row of comments. I could simply make sure that skiprows=3, but the complication is that I have a very large number of files, which don't all necessarily have the same number of commented lines at the top of the file. How can I make sure that when I use loadtxt I'm only getting the actual data in a situation like this?

这告诉我 kwarg 'skiprows' 没有跳过标题，而是跳过第一行评论。我可以简单地确保skiprows=3，但复杂的是我有大量的文件，这些文件的顶部不一定有相同数量的注释行。如何确保在使用 loadtxt 时仅在这种情况下获取实际数据？

P.S. - I'm open to bash solutions, too.

PS - 我也愿意接受 bash 解决方案。

Answer 1

采纳答案by falsetru

Skip comment line manually using generator expression:

使用生成器表达式手动跳过注释行：

import numpy as np

with open('datafile1.csv') as f:
    lines = (line for line in f if not line.startswith('#'))
    FH = np.loadtxt(lines, delimiter=',', skiprows=1)

Answer 2

回答by Jon Clements

Create your own custom filter function, such as:

创建您自己的自定义过滤器功能，例如：

def skipper(fname):
    with open(fname) as fin:
        no_comments = (line for line in fin if not line.lstrip().startswith('#'))
        next(no_comments, None) # skip header
        for row in no_comments:
            yield row

a = np.loadtxt(skipper('your_file'), delimiter=',')

Answer 3

回答by JeffZheng

def skipper(fname, header=False):
    with open(fname) as fin:
        no_comments = (line for line in fin if not line.lstrip().startswith('#'))
        if header:
            next(no_comments, None) # skip header
        for row in no_comments:
            yield row

a = np.loadtxt(skipper('your_file'), delimiter=',')

This is just a little modification of @Jon Clements's answer by adding an optional parameter "header", given that in some cases, the csv file has comment lines (starts with #) but doesn't have the header row.

考虑到在某些情况下，csv 文件具有注释行（以 # 开头）但没有标题行，这只是对 @Jon Clements 的答案的一个小小的修改，添加了一个可选参数“header”。

Python numpy loadtxt 跳过第一行

提问by astromax

采纳答案by falsetru

回答by Jon Clements

回答by JeffZheng

相关推荐

最近更新

标签

Python numpy loadtxt 跳过第一行

提问by astromax

采纳答案by falsetru

回答by Jon Clements

回答by JeffZheng

相关推荐

Python 如何忽略传递给函数的意外关键字参数？

Homebrew brew doctor 关于 /Library/Frameworks/Python.framework 的警告，即使安装了 brew 的 Python

Python 如何让 PyC​​harm 检查 PEP8 代码样式？

Python multiprocessing.Pool：map_async 和 imap 有什么区别？

相关推荐

最近更新

标签

Python 如何让 PyCharm 检查 PEP8 代码样式？