Python Pandas:如何仅读取前 n 行 CSV 文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23853553/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: How to read only first n rows of CSV files in?
提问by bensw
I have a very large data set and I can't afford to read the entire data set in. So, I'm thinking of reading only one chunk of it to train but I have no idea how to do it. Any thought will be appreciated.
我有一个非常大的数据集,我无法读取整个数据集。所以,我想只读取其中的一部分进行训练,但我不知道该怎么做。任何想法将不胜感激。
采纳答案by smci
If you only want to read the first 999,999 (non-header) rows:
如果您只想读取前 999,999(非标题)行:
read_csv(..., nrows=999999)
If you only want to read rows 1,000,000 ... 1,999,999
如果您只想读取行 1,000,000 ... 1,999,999
read_csv(..., skiprows=1000000, nrows=999999)
nrows: int, default None Number of rows of file to read. Useful for reading pieces of large files*
nrows:int,默认无要读取的文件行数。用于读取大文件片段*
skiprows: list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file
skiprows: 类似列表或整数要跳过的行号(0-indexed)或要跳过的行数(int)在文件的开头
and for large files, you'll probably also want to use chunksize:
对于大文件,您可能还想使用 chunksize:
chunksize: int, default None Return TextFileReader object for iteration
chunksize: int, 默认 None 返回 TextFileReader 对象进行迭代