pandas 读取 csv 文件的一部分
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46355419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading a part of csv file
提问by John Constantine
I have a really large csv file about 10GB. When ever I try to read in into iPython notebook using
我有一个大约 10GB 的非常大的 csv 文件。当我尝试使用
data = pd.read_csv("data.csv")
my laptop gets stuck. Is it possible to just read like 10,000 rows or 500 MB of a csv file.
我的笔记本电脑卡住了。是否可以只读取 10,000 行或 500 MB 的 csv 文件。
回答by miradulo
It is possible. You can create an iterator yielding chunks of your csv of a certain size at a time as a DataFrame by passing iterator=True
with your desired chunksize
to read_csv
.
有可能的。您可以创建一个迭代器,通过将iterator=True
您想要chunksize
的read_csv
.
df_iter = pd.read_csv('data.csv', chunksize=10000, iterator=True)
for iter_num, chunk in enumerate(df_iter, 1):
print(f'Processing iteration {iter_num}')
# do things with chunk
Or more briefly
或者更简短
for chunk in pd.read_csv('data.csv', chunksize=10000):
# do things with chunk
Alternatively if there was just a specific part of the csv you wanted to read, you could use the skiprows
and nrows
options to start at a particular line and subsequently read n
rows, as the naming suggests.
或者,如果您只想读取 csv 的特定部分,您可以使用skiprows
和nrows
选项从特定行开始,然后n
按照命名建议读取行。
回答by user3212593
Likely a memory issue. On read_csv you can set chunksize (where you can specify number of rows).
可能是内存问题。在 read_csv 上,您可以设置块大小(您可以在其中指定行数)。
Alternatively, if you don't need all the columns, you can change usecols on read_csv to import only the columns you need.
或者,如果您不需要所有列,您可以更改 read_csv 上的 usecols 以仅导入您需要的列。