python脚本将目录中的所有文件连接成一个文件

Question

提问by user1629366

I have written the following script to concatenate all the files in the directory into one single file.

我编写了以下脚本将目录中的所有文件连接成一个文件。

Can this be optimized, in terms of

这可以优化吗？

idiomatic python
time

惯用蟒蛇
时间

Here is the snippet:

这是片段：

import time, glob

outfilename = 'all_' + str((int(time.time()))) + ".txt"

filenames = glob.glob('*.txt')

with open(outfilename, 'wb') as outfile:
    for fname in filenames:
        with open(fname, 'r') as readfile:
            infile = readfile.read()
            for line in infile:
                outfile.write(line)
            outfile.write("\n\n")

Answer 1

采纳答案by Martijn Pieters

Use shutil.copyfileobjto copy data:

使用shutil.copyfileobj复制的数据：

import shutil

with open(outfilename, 'wb') as outfile:
    for filename in glob.glob('*.txt'):
        if filename == outfilename:
            # don't want to copy the output into the output
            continue
        with open(filename, 'rb') as readfile:
            shutil.copyfileobj(readfile, outfile)

shutilreads from the readfileobject in chunks, writing them to the outfilefileobject directly. Do not use readline()or a iteration buffer, since you do not need the overhead of finding line endings.

shutil从readfile块中读取对象，将它们outfile直接写入文件对象。不要使用readline()或迭代缓冲区，因为您不需要查找行尾的开销。

Use the same mode for both reading and writing; this is especially important when using Python 3; I've used binary mode for both here.

使用相同的模式进行读写；这在使用 Python 3 时尤为重要；我在这里都使用了二进制模式。

Answer 2

回答by Brendan Long

You can iterate over the lines of a file object directly, without reading the whole thing into memory:

您可以直接遍历文件对象的行，而无需将整个内容读入内存：

with open(fname, 'r') as readfile:
    for line in readfile:
        outfile.write(line)

Answer 3

回答by MGP

No need to use that many variables.

无需使用那么多变量。

with open(outfilename, 'w') as outfile:
    for fname in filenames:
        with open(fname, 'r') as readfile:
            outfile.write(readfile.read() + "\n\n")

Answer 4

回答by iruvar

The fileinputmodule provides a natural way to iterate over multiple files

该的FileInput模块在多个文件提供了一种自然的方式来遍历

for line in fileinput.input(glob.glob("*.txt")):
    outfile.write(line)

Answer 5

回答by Stephen Miller

Using Python 2.7, I did some "benchmark" testing of

使用 Python 2.7，我做了一些“基准”测试

outfile.write(infile.read())

vs

对比

shutil.copyfileobj(readfile, outfile)

I iterated over 20 .txt files ranging in size from 63 MB to 313 MB with a joint file size of ~ 2.6 GB. In both methods, normal read mode performed better than binary read mode and shutil.copyfileobj was generally faster than outfile.write.

我迭代了 20 多个 .txt 文件，大小从 63 MB 到 313 MB 不等，联合文件大小约为 2.6 GB。在这两种方法中，正常读取模式比二进制读取模式执行得更好，并且shutil.copyfileobj 通常比outfile.write 快。

When comparing the worst combination (outfile.write, binary mode) with the best combination (shutil.copyfileobj, normal read mode), the difference was quite significant:

在比较最差的组合（outfile.write，二进制模式）和最佳组合（shutil.copyfileobj，正常读取模式）时，差异非常显着：

outfile.write, binary mode: 43 seconds, on average.

shutil.copyfileobj, normal mode: 27 seconds, on average.

The outfile had a final size of 2620 MB in normal read mode vs 2578 MB in binary read mode.

输出文件在正常读取模式下的最终大小为 2620 MB，而在二进制读取模式下为 2578 MB。

Answer 6

回答by Ravi Kumar Gupta

I was curious to check more on performance and I used answers of Martijn Pieters and Stephen Miller.

我很想检查更多关于性能的信息，我使用了 Martijn Pieters 和 Stephen Miller 的答案。

I tried binary and text modes with shutiland without shutil. I tried to merge 270 files.

我尝试了带shutil和不带shutil. 我试图合并 270 个文件。

Text mode -

文字模式——

def using_shutil_text(outfilename):
    with open(outfilename, 'w') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'r') as readfile:
                shutil.copyfileobj(readfile, outfile)

def without_shutil_text(outfilename):
    with open(outfilename, 'w') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'r') as readfile:
                outfile.write(readfile.read())

Binary mode -

二进制模式 -

def using_shutil_text(outfilename):
    with open(outfilename, 'wb') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'rb') as readfile:
                shutil.copyfileobj(readfile, outfile)

def without_shutil_text(outfilename):
    with open(outfilename, 'wb') as outfile:
        for filename in glob.glob('*.txt'):
            if filename == outfilename:
                # don't want to copy the output into the output
                continue
            with open(filename, 'rb') as readfile:
                outfile.write(readfile.read())

Running times for binary mode -

二进制模式的运行时间 -

Shutil - 20.161773920059204
Normal - 17.327500820159912

Running times for text mode -

文本模式的运行时间 -

Shutil - 20.47757601737976
Normal - 13.718038082122803

Looks like in both modes, shutil performs same while text mode is faster than binary.

看起来在这两种模式下，shutil 执行相同，而文本模式比二进制快。

OS: Mac OS 10.14 Mojave. Macbook Air 2017.

操作系统：Mac OS 10.14 Mojave。2017 年的 Macbook Air。

python脚本将目录中的所有文件连接成一个文件

提问by user1629366

采纳答案by Martijn Pieters

回答by Brendan Long

回答by MGP

回答by iruvar

回答by Stephen Miller

回答by Ravi Kumar Gupta

相关推荐

最近更新

标签

python脚本将目录中的所有文件连接成一个文件

提问by user1629366

采纳答案by Martijn Pieters

回答by Brendan Long

回答by MGP

回答by iruvar

回答by Stephen Miller

回答by Ravi Kumar Gupta

相关推荐

Python 搜索并替换为“仅整个单词”选项

Python 如何标记和更改 Seaborn kdeplot 轴的比例

Python 使用 numpy 将矩阵附加到现有文件

Python 将“SPARK_HOME”设置为什么？

相关推荐

最近更新

标签