pandas 是否可以使用 read_csv 仅读取特定行？

Question

提问by user1412286

I have a csv file that looks like this:

我有一个如下所示的 csv 文件：

TEST  
2012-05-01 00:00:00.203 ON 1  
2012-05-01 00:00:11.203 OFF 0  
2012-05-01 00:00:22.203 ON 1  
2012-05-01 00:00:33.203 OFF 0  
2012-05-01 00:00:44.203 OFF 0  
TEST  
2012-05-02 00:00:00.203 OFF 0  
2012-05-02 00:00:11.203 OFF 0  
2012-05-02 00:00:22.203 OFF 0  
2012-05-02 00:00:33.203 OFF 0  
2012-05-02 00:00:44.203 ON 1  
2012-05-02 00:00:55.203 OFF 0

and cannot get rid of the "TEST"string.

并且无法摆脱"TEST"字符串。

Is it possible to check whether a line starts with a date and read only those that do?

是否可以检查一行是否以日期开头并只读取那些以日期开头的行？

Answer 1

采纳答案by eumiro

from cStringIO import StringIO
import pandas

s = StringIO()
with open('file.csv') as f:
    for line in f:
        if not line.startswith('TEST'):
            s.write(line)
s.seek(0) # "rewind" to the beginning of the StringIO object

pandas.read_csv(s) # with further parameters…

Answer 2

回答by pepr

When you get the rowfrom the csv.reader, and when you can be sure that the first element is a string, then you can use

当您row从中获取csv.reader，并且可以确定第一个元素是字符串时，则可以使用

if not row[0].startswith('TEST'):
    process(row)

Answer 3

回答by Maxim Egorushkin

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html?highlight=read_csv#pandas.io.parsers.read_csv

skiprows : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int)

skiprows : 类似列表或整数要跳过的行号（0-indexed）或要跳过的行数（int）

Pass [0, 6]to skip rows with "TEST".

通过[0, 6]以跳过带有“TEST”的行。

Answer 4

回答by Dougal

Another option, since I just ran into this problem also:

另一种选择，因为我也遇到了这个问题：

import pandas as pd
import subprocess
grep = subprocess.check_output(['grep', '-n', '^TITLE', filename]).splitlines()
bad_lines = [int(s[:s.index(':')]) - 1 for s in grep]
df = pd.read_csv(filename, skiprows=bad_lines)

It's less portable than @eumiro's (read: probably doesn't work on Windows) and requires reading the file twice, but has the advantage that you don't have to store the entire file contents in memory.

它不如@eumiro 的便携（阅读：可能不适用于 Windows）并且需要读取文件两次，但优点是您不必将整个文件内容存储在内存中。

You could of course do the same thing as the grep in Python, but it'd probably be slower.

你当然可以做与 Python 中的 grep 相同的事情，但它可能会更慢。

pandas 是否可以使用 read_csv 仅读取特定行？

提问by user1412286

采纳答案by eumiro

回答by pepr

回答by Maxim Egorushkin

回答by Dougal

相关推荐

最近更新

标签

pandas 是否可以使用 read_csv 仅读取特定行？

提问by user1412286

采纳答案by eumiro

回答by pepr

回答by Maxim Egorushkin

回答by Dougal

相关推荐

wpf 无法找到版本 (>=3.0.0) 的软件包 Microsoft.NETCore.App

相当于 WPF dotnet core 中的 UserSettings / ApplicationSettings

pandas 在 Python 中计算复合收益系列

按升序对 Pandas DataMatrix 进行排序

相关推荐

最近更新

标签