python脚本将目录中的所有文件连接成一个文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17749484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:05:10  来源:igfitidea点击:

python script to concatenate all the files in the directory into one file

pythonfilecopy

提问by user1629366

I have written the following script to concatenate all the files in the directory into one single file.

我编写了以下脚本将目录中的所有文件连接成一个文件。

Can this be optimized, in terms of

这可以优化吗?

  1. idiomatic python

  2. time

  1. 惯用蟒蛇

  2. 时间

Here is the snippet:

这是片段:

import time, glob

outfilename = 'all_' + str((int(time.time()))) + ".txt"

filenames = glob.glob('*.txt')

with open(outfilename, 'wb') as outfile:
    for fname in filenames:
        with open(fname, 'r') as readfile:
            infile = readfile.read()
            for line in infile:
                outfile.write(line)
            outfile.write("\n\n")

采纳答案by Martijn Pieters

Use shutil.copyfileobjto copy data:

使用shutil.copyfileobj复制的数据:

import shutil

with open(outfilename, 'wb') as outfile:
    for filename in glob.glob('*.txt'):
        if filename == outfilename:
            # don't want to copy the output into the output
            continue
        with open(filename, 'rb') as readfile:
            shutil.copyfileobj(readfile, outfile)

shutilreads from the readfileobject in chunks, writing them to the outfilefileobject directly. Do not use readline()or a iteration buffer, since you do not need the overhead of finding line endings.

shutilreadfile块中读取对象,将它们outfile直接写入文件对象。不要使用readline()或 迭代缓冲区,因为您不需要查找行尾的开销。

Use the same mode for both reading and writing; this is especially important when using Python 3; I've used binary mode for both here.

使用相同的模式进行读写;这在使用 Python 3 时尤为重要;我在这里都使用了二进制模式。

回答by Brendan Long

You can iterate over the lines of a file object directly, without reading the whole thing into memory:

您可以直接遍历文件对象的行,而无需将整个内容读入内存:

with open(fname, 'r') as readfile:
    for line in readfile:
        outfile.write(line)

回答by MGP

No need to use that many variables.

无需使用那么多变量。

with open(outfilename, 'w') as outfile:
    for fname in filenames:
        with open(fname, 'r') as readfile:
            outfile.write(readfile.read() + "\n\n")

回答by iruvar

The fileinputmodule provides a natural way to iterate over multiple files

的FileInput模块在多个文件提供了一种自然的方式来遍历

for line in fileinput.input(glob.glob("*.txt")):
    outfile.write(line)

回答by Stephen Miller

Using Python 2.7, I did some "benchmark" testing of

使用 Python 2.7,我做了一些“基准”测试

outfile.write(infile.read())

vs

对比

shutil.copyfileobj(readfile, outfile)

I iterated over 20 .txt files ranging in size from 63 MB to 313 MB with a joint file size of ~ 2.6 GB. In both methods, normal read mode performed better than binary read mode and shutil.copyfileobj was generally faster than outfile.write.

我迭代了 20 多个 .txt 文件,大小从 63 MB 到 313 MB 不等,联合文件大小约为 2.6 GB。在这两种方法中,正常读取模式比二进制读取模式执行得更好,并且shutil.copyfileobj 通常比outfile.write 快。

When comparing the worst combination (outfile.write, binary mode) with the best combination (shutil.copyfileobj, normal read mode), the difference was quite significant:

在比较最差的组合(outfile.write,二进制模式)和最佳组合(shutil.copyfileobj,正常读取模式)时,差异非常显着:

outfile.write, binary mode: 43 seconds, on average.

shutil.copyfileobj, normal mode: 27 seconds, on average.

The outfile had a final size of 2620 MB in normal read mode vs 2578 MB in binary read mode.

输出文件在正常读取模式下的最终大小为 2620 MB,而在二进制读取模式下为 2578 MB。

回答by Ravi Kumar Gupta

I was curious to check more on performance and I used answers of Martijn Pieters and Stephen Miller.

我很想检查更多关于性能的信息,我使用了 Martijn Pieters 和 Stephen Miller 的答案。

I tried binary and text modes with shutiland without shutil. I tried to merge 270 files.

我尝试了带shutil和不带shutil. 我试图合并 270 个文件。

Text mode -

文字模式——

def using_shutil_text(outfilename):
    with open(outfilename, 'w') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'r') as readfile:
                shutil.copyfileobj(readfile, outfile)

def without_shutil_text(outfilename):
    with open(outfilename, 'w') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'r') as readfile:
                outfile.write(readfile.read())

Binary mode -

二进制模式 -

def using_shutil_text(outfilename):
    with open(outfilename, 'wb') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'rb') as readfile:
                shutil.copyfileobj(readfile, outfile)

def without_shutil_text(outfilename):
    with open(outfilename, 'wb') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'rb') as readfile:
                outfile.write(readfile.read())

Running times for binary mode -

二进制模式的运行时间 -

Shutil - 20.161773920059204
Normal - 17.327500820159912

Running times for text mode -

文本模式的运行时间 -

Shutil - 20.47757601737976
Normal - 13.718038082122803

Looks like in both modes, shutil performs same while text mode is faster than binary.

看起来在这两种模式下,shutil 执行相同,而文本模式比二进制快。

OS: Mac OS 10.14 Mojave. Macbook Air 2017.

操作系统:Mac OS 10.14 Mojave。2017 年的 Macbook Air。