pandas 是否可以使用 read_csv 仅读取特定行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10717504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is it possible to use read_csv to read only specific lines?
提问by user1412286
I have a csv file that looks like this:
我有一个如下所示的 csv 文件:
TEST
2012-05-01 00:00:00.203 ON 1
2012-05-01 00:00:11.203 OFF 0
2012-05-01 00:00:22.203 ON 1
2012-05-01 00:00:33.203 OFF 0
2012-05-01 00:00:44.203 OFF 0
TEST
2012-05-02 00:00:00.203 OFF 0
2012-05-02 00:00:11.203 OFF 0
2012-05-02 00:00:22.203 OFF 0
2012-05-02 00:00:33.203 OFF 0
2012-05-02 00:00:44.203 ON 1
2012-05-02 00:00:55.203 OFF 0
and cannot get rid of the "TEST"string.
并且无法摆脱"TEST"字符串。
Is it possible to check whether a line starts with a date and read only those that do?
是否可以检查一行是否以日期开头并只读取那些以日期开头的行?
采纳答案by eumiro
from cStringIO import StringIO
import pandas
s = StringIO()
with open('file.csv') as f:
for line in f:
if not line.startswith('TEST'):
s.write(line)
s.seek(0) # "rewind" to the beginning of the StringIO object
pandas.read_csv(s) # with further parameters…
回答by pepr
When you get the rowfrom the csv.reader, and when you can be sure that the first element is a string, then you can use
当您row从 中获取csv.reader,并且可以确定第一个元素是字符串时,则可以使用
if not row[0].startswith('TEST'):
process(row)
回答by Maxim Egorushkin
skiprows : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int)
skiprows : 类似列表或整数要跳过的行号(0-indexed)或要跳过的行数(int)
Pass [0, 6]to skip rows with "TEST".
通过[0, 6]以跳过带有“TEST”的行。
回答by Dougal
Another option, since I just ran into this problem also:
另一种选择,因为我也遇到了这个问题:
import pandas as pd
import subprocess
grep = subprocess.check_output(['grep', '-n', '^TITLE', filename]).splitlines()
bad_lines = [int(s[:s.index(':')]) - 1 for s in grep]
df = pd.read_csv(filename, skiprows=bad_lines)
It's less portable than @eumiro's (read: probably doesn't work on Windows) and requires reading the file twice, but has the advantage that you don't have to store the entire file contents in memory.
它不如@eumiro 的便携(阅读:可能不适用于 Windows)并且需要读取文件两次,但优点是您不必将整个文件内容存储在内存中。
You could of course do the same thing as the grep in Python, but it'd probably be slower.
你当然可以做与 Python 中的 grep 相同的事情,但它可能会更慢。

