Python读取大文本文件(几GB)的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14944183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:55:07  来源:igfitidea点击:

Python fastest way to read a large text file (several GB)

pythonperformanceoptimizationlinechunking

提问by Gianni Spear

i have a large text file (~7 GB). I am looking if exist the fastest way to read large text file. I have been reading about using several approach as read chunk-by-chunk in order to speed the process.

我有一个大文本文件(~7 GB)。我正在寻找是否存在读取大文本文件的最快方法。我一直在阅读有关使用多种方法逐块读取以加快进程的信息。

at example effbotsuggest

例如effbot建议

# File: readline-example-3.py

file = open("sample.txt")

while 1:
    lines = file.readlines(100000)
    if not lines:
        break
    for line in lines:
        pass # do something**strong text**

in order to process 96,900 lines of text per second. Other authorssuggest to use islice()

为了每秒处理 96,900 行文本。其他作者建议使用 islice()

from itertools import islice

with open(...) as f:
    while True:
        next_n_lines = list(islice(f, n))
        if not next_n_lines:
            break
        # process next_n_lines

list(islice(f, n))will return a list of the next nlines of the file f. Using this inside a loop will give you the file in chunks of nlines

list(islice(f, n))将返回n文件下一行的列表f。使用这个循环里面会给你的块中的文件n

回答by Morten Larsen

with open(<FILE>) as FileObj:
    for lines in FileObj:
        print lines # or do some other thing with the line...

will read one line at the time to memory, and close the file when done...

将一次读取一行到内存中,完成后关闭文件...