python脚本将目录中的所有文件连接成一个文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17749484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python script to concatenate all the files in the directory into one file
提问by user1629366
I have written the following script to concatenate all the files in the directory into one single file.
我编写了以下脚本将目录中的所有文件连接成一个文件。
Can this be optimized, in terms of
这可以优化吗?
idiomatic python
time
惯用蟒蛇
时间
Here is the snippet:
这是片段:
import time, glob
outfilename = 'all_' + str((int(time.time()))) + ".txt"
filenames = glob.glob('*.txt')
with open(outfilename, 'wb') as outfile:
for fname in filenames:
with open(fname, 'r') as readfile:
infile = readfile.read()
for line in infile:
outfile.write(line)
outfile.write("\n\n")
采纳答案by Martijn Pieters
Use shutil.copyfileobj
to copy data:
使用shutil.copyfileobj
复制的数据:
import shutil
with open(outfilename, 'wb') as outfile:
for filename in glob.glob('*.txt'):
if filename == outfilename:
# don't want to copy the output into the output
continue
with open(filename, 'rb') as readfile:
shutil.copyfileobj(readfile, outfile)
shutil
reads from the readfile
object in chunks, writing them to the outfile
fileobject directly. Do not use readline()
or a iteration buffer, since you do not need the overhead of finding line endings.
shutil
从readfile
块中读取对象,将它们outfile
直接写入文件对象。不要使用readline()
或 迭代缓冲区,因为您不需要查找行尾的开销。
Use the same mode for both reading and writing; this is especially important when using Python 3; I've used binary mode for both here.
使用相同的模式进行读写;这在使用 Python 3 时尤为重要;我在这里都使用了二进制模式。
回答by Brendan Long
You can iterate over the lines of a file object directly, without reading the whole thing into memory:
您可以直接遍历文件对象的行,而无需将整个内容读入内存:
with open(fname, 'r') as readfile:
for line in readfile:
outfile.write(line)
回答by MGP
No need to use that many variables.
无需使用那么多变量。
with open(outfilename, 'w') as outfile:
for fname in filenames:
with open(fname, 'r') as readfile:
outfile.write(readfile.read() + "\n\n")
回答by iruvar
The fileinputmodule provides a natural way to iterate over multiple files
该的FileInput模块在多个文件提供了一种自然的方式来遍历
for line in fileinput.input(glob.glob("*.txt")):
outfile.write(line)
回答by Stephen Miller
Using Python 2.7, I did some "benchmark" testing of
使用 Python 2.7,我做了一些“基准”测试
outfile.write(infile.read())
vs
对比
shutil.copyfileobj(readfile, outfile)
I iterated over 20 .txt files ranging in size from 63 MB to 313 MB with a joint file size of ~ 2.6 GB. In both methods, normal read mode performed better than binary read mode and shutil.copyfileobj was generally faster than outfile.write.
我迭代了 20 多个 .txt 文件,大小从 63 MB 到 313 MB 不等,联合文件大小约为 2.6 GB。在这两种方法中,正常读取模式比二进制读取模式执行得更好,并且shutil.copyfileobj 通常比outfile.write 快。
When comparing the worst combination (outfile.write, binary mode) with the best combination (shutil.copyfileobj, normal read mode), the difference was quite significant:
在比较最差的组合(outfile.write,二进制模式)和最佳组合(shutil.copyfileobj,正常读取模式)时,差异非常显着:
outfile.write, binary mode: 43 seconds, on average.
shutil.copyfileobj, normal mode: 27 seconds, on average.
The outfile had a final size of 2620 MB in normal read mode vs 2578 MB in binary read mode.
输出文件在正常读取模式下的最终大小为 2620 MB,而在二进制读取模式下为 2578 MB。
回答by Ravi Kumar Gupta
I was curious to check more on performance and I used answers of Martijn Pieters and Stephen Miller.
我很想检查更多关于性能的信息,我使用了 Martijn Pieters 和 Stephen Miller 的答案。
I tried binary and text modes with shutil
and without shutil
. I tried to merge 270 files.
我尝试了带shutil
和不带shutil
. 我试图合并 270 个文件。
Text mode -
文字模式——
def using_shutil_text(outfilename):
with open(outfilename, 'w') as outfile:
for filename in glob.glob('*.txt'):
if filename == outfilename:
# don't want to copy the output into the output
continue
with open(filename, 'r') as readfile:
shutil.copyfileobj(readfile, outfile)
def without_shutil_text(outfilename):
with open(outfilename, 'w') as outfile:
for filename in glob.glob('*.txt'):
if filename == outfilename:
# don't want to copy the output into the output
continue
with open(filename, 'r') as readfile:
outfile.write(readfile.read())
Binary mode -
二进制模式 -
def using_shutil_text(outfilename):
with open(outfilename, 'wb') as outfile:
for filename in glob.glob('*.txt'):
if filename == outfilename:
# don't want to copy the output into the output
continue
with open(filename, 'rb') as readfile:
shutil.copyfileobj(readfile, outfile)
def without_shutil_text(outfilename):
with open(outfilename, 'wb') as outfile:
for filename in glob.glob('*.txt'):
if filename == outfilename:
# don't want to copy the output into the output
continue
with open(filename, 'rb') as readfile:
outfile.write(readfile.read())
Running times for binary mode -
二进制模式的运行时间 -
Shutil - 20.161773920059204
Normal - 17.327500820159912
Running times for text mode -
文本模式的运行时间 -
Shutil - 20.47757601737976
Normal - 13.718038082122803
Looks like in both modes, shutil performs same while text mode is faster than binary.
看起来在这两种模式下,shutil 执行相同,而文本模式比二进制快。
OS: Mac OS 10.14 Mojave. Macbook Air 2017.
操作系统:Mac OS 10.14 Mojave。2017 年的 Macbook Air。