Python Pandas:如何仅读取前 n 行 CSV 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23853553/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:32:21  来源:igfitidea点击:

Python Pandas: How to read only first n rows of CSV files in?

pythonpandascsvfile-io

提问by bensw

I have a very large data set and I can't afford to read the entire data set in. So, I'm thinking of reading only one chunk of it to train but I have no idea how to do it. Any thought will be appreciated.

我有一个非常大的数据集,我无法读取整个数据集。所以,我想只读取其中的一部分进行训练,但我不知道该怎么做。任何想法将不胜感激。

采纳答案by smci

If you only want to read the first 999,999 (non-header) rows:

如果您只想读取前 999,999(非标题)行:

read_csv(..., nrows=999999)

If you only want to read rows 1,000,000 ... 1,999,999

如果您只想读取行 1,000,000 ... 1,999,999

read_csv(..., skiprows=1000000, nrows=999999)

nrows: int, default None Number of rows of file to read. Useful for reading pieces of large files*

nrows:int,默认无要读取的文件行数。用于读取大文件片段*

skiprows: list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file

skiprows: 类似列表或整数要跳过的行号(0-indexed)或要跳过的行数(int)在文件的开头

and for large files, you'll probably also want to use chunksize:

对于大文件,您可能还想使用 chunksize:

chunksize: int, default None Return TextFileReader object for iteration

chunksize: int, 默认 None 返回 TextFileReader 对象进行迭代

pandas.io.parsers.read_csv documentation

pandas.io.parsers.read_csv 文档