如何在Python中读取大文件的特定部分

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15644859/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:36:45  来源:igfitidea点击:

How to read specific part of large file in Python

pythonparsing

提问by Cerin

Given a large file (hundreds of MB) how would I use Python to quickly read the content between a specific start and end index within the file?

给定一个大文件(数百 MB),我将如何使用 Python 快速读取文件中特定开始和结束索引之间的内容?

Essentially, I'm looking for a more efficient way of doing:

本质上,我正在寻找一种更有效的方法:

open(filename).read()[start_index:end_index]

采纳答案by Dan Lecocq

You can seekinto the file the file and then read a certain amount from there. Seek allows you to get to a specific offset within a file, and then you can limit your read to only the number of bytes in that range.

您可以seek将文件放入文件中,然后从那里读取一定数量的文件。Seek 允许您获得文件内的特定偏移量,然后您可以将读取限制为该范围内的字节数。

with open(filename) as fin:
    fin.seek(start_index)
    data = fin.read(end_index - start_index)

That will only read that data that you're looking for.

那只会读取您正在寻找的数据。

回答by Will Leeney

This is my solution with variable width encoding. My CSV file contains a dictionary where each row is a new item.

这是我的可变宽度编码解决方案。我的 CSV 文件包含一个字典,其中每一行都是一个新项目。

def get_stuff(filename, count, start_index):
    with open(filename, 'r') as infile:
             reader = csv.reader(infile)
             num = 0 
             for idx, row in enumerate(reader):
                 if idx >= start_index-1:
                     if num >= count:
                         return
                 else:
                     yield row 
                     num += 1