如何在python中读取文件夹中的txt文件列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35672809/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how to read a list of txt files in a folder in python
提问by hhs
I am new to python, I wrote an algorithm to read 10 txt files in a folder and then write the first line of each of them in one txt outfile. but it doesn't work. I mean after I run it, I will neither face any error nor get the outfile.
我是 Python 新手,我编写了一个算法来读取文件夹中的 10 个 txt 文件,然后将每个文件的第一行写入一个 txt 输出文件中。但它不起作用。我的意思是在我运行它之后,我既不会遇到任何错误,也不会得到输出文件。
def MergePerFolder(path):
path1=listdir_fullpath(path)
for i in path1:
infile=open(i)
outfile=open('F:// merge1.txt', 'w')
a=infile.readline().split('.')
for k in range (len(a)):
print(a[0], file=outfile, end='')
infile.close()
outfile.close
print("done")
回答by hb X
in this example may be you should close the outfile in loop because it is trying to open many times without closing previous one
在这个例子中,你可能应该关闭循环中的输出文件,因为它试图打开很多次而不关闭前一个
回答by Eddo Hintoso
NOTE: I do write the function(s) at the end of my answer, so feel free to jump to that - but I still wanted to run through the code part by part for the sake of better understanding.
注意:我确实在答案的末尾写了函数,所以可以随意跳到那个 - 但为了更好地理解,我仍然想部分地运行代码。
Example scenario that will be used for explanation
将用于解释的示例场景
Say you have 12 files in this folder called test
, 10 of which are .txt
files:
假设您在此文件夹中有 12 个文件,名为test
,其中 10 个是.txt
文件:
.../
test/
01.txt
02.txt
03.txt
04.txt
05.txt
06.txt
07.txt
08.txt
09.txt
10.txt
random_file.py
this_shouldnt_be_here.sh
With each .txt
file having their first line as their corresponding number, like
每个.txt
文件的第一行都是对应的编号,例如
01.txt
contains the first line01
,02.txt
contains the first line02
,- etc...
01.txt
包含第一行01
,02.txt
包含第一行02
,- 等等...
List all text files in the designated directory
列出指定目录下的所有文本文件
You can do this in two ways:
您可以通过两种方式执行此操作:
Method 1: os
module
方法一:os
模块
You can import the module os
and use the method listdir
to list all the files in that directory. It is important to note that all files in the list will be relative filenames:
您可以导入模块os
并使用该方法listdir
列出该目录中的所有文件。需要注意的是,列表中的所有文件都是相对文件名:
>>> import os
>>> all_files = os.listdir("test/") # imagine you're one directory above test dir
>>> print(all_files) # won't necessarily be sorted
['08.txt', '02.txt', '09.txt', '04.txt', '05.txt', '06.txt', '07.txt', '03.txt', '06.txt', '01.txt', 'this_shouldnt_be_here.sh', '10.txt', 'random_file.py']
Now, you only want the .txt
files, so with a bit of functional programming using the filter
function and anonymous functions, you can easily filter them out without using standard for
loops:
现在,您只需要.txt
文件,因此通过使用filter
函数和匿名函数的一些函数式编程,您可以轻松地将它们过滤掉,而无需使用标准for
循环:
>>> txt_files = filter(lambda x: x[-4:] == '.txt', all_files)
>>> print(txt_files) # only text files
['08.txt', '02.txt', '09.txt', '04.txt', '05.txt', '06.txt', '07.txt', '03.txt', '06.txt', '01.txt', '10.txt']
Method 2: glob
module
方法二:glob
模块
Similarly, you can use the glob
module and use the glob.glob
function to list all text files in the directory without using any functional programming above! The only difference is that glob
will output a list with prefix paths, however you inputted it.
同样,您可以使用glob
模块和glob.glob
函数来列出目录中的所有文本文件,而无需使用上述任何函数式编程!唯一的区别是glob
会输出一个带有前缀路径的列表,但是你输入了它。
>>> import glob
>>> txt_files = glob.glob("test/*.txt")
['test/08.txt', 'test/02.txt', 'test/09.txt', 'test/04.txt', 'test/05.txt', 'test/06.txt', 'test/07.txt', 'test/03.txt', 'test/06.txt', 'test/01.txt', 'test/10.txt']
What I mean by glob
outputting the list by however you input the relative or full path - for example, if you were in the test
directory and you called glob.glob('./*.txt')
, you would get a list like:
我的意思是通过glob
输入相对路径或完整路径来输出列表 - 例如,如果您在test
目录中并调用glob.glob('./*.txt')
,您将获得如下列表:
>>> glob.glob('./*.txt')
['./08.txt', './02.txt', './09.txt', ... ]
By the way, ./
means in the same directory. Alternatively, you can just not prepend the ./
- but the string representations will accordingly change:
顺便说一下,./
意味着在同一目录中。或者,您可以不添加./
- 但字符串表示会相应地改变:
>>> glob.glob("*.txt") # already in directory containing the text files
['08.txt', '02.txt', '09.txt', ... ]
Doing something with a file using file context managers
使用文件上下文管理器对文件进行处理
Alright, now the problem with your code is that you are opening these connections to all these files without closing them. Generally, the procedure to do something with a file in python is this:
好的,现在您的代码的问题是您打开了所有这些文件的连接,而没有关闭它们。通常,在 python 中处理文件的过程是这样的:
fd = open(filename, mode)
fd.method # could be write(), read(), readline(), etc...
fd.close()
Now, the problem with this is that if something goes wrong in the second line where you call a method on the file, the file will never close and you're in big trouble.
现在,问题在于,如果在第二行调用文件的方法时出现问题,文件将永远不会关闭,您就会遇到大麻烦。
To prevent this, we use what we call file context managerin Python using the with
keyword. This ensures the file will close with or without failures.
为了防止这种情况,我们使用关键字在 Python 中使用我们称之为文件上下文管理器的东西with
。这可以确保文件在失败或没有失败的情况下关闭。
with open(filename, mode) as fd:
fd.method
Reading the first line of a file with readline()
读取文件的第一行 readline()
As you probably know already, to extract the first line of a file, you simply have to open it and call the readline()
method. We want to do this with all the text files listed in txt_files
, but yes - you can do this with functional programming map
function, except this time we won't be writing an anonymous function (for readability):
您可能已经知道,要提取文件的第一行,您只需打开它并调用该readline()
方法。我们想对 中列出的所有文本文件执行此操作txt_files
,但是是的 - 您可以使用函数式编程map
函数执行此操作,但这次我们不会编写匿名函数(为了可读性):
>>> def read_first_line(file):
... with open(file, 'rt') as fd:
... first_line = fd.readline()
... return first_line
...
>>> output_strings = map(read_first_line, txt_files) # apply read first line function all text files
>>> print(output_strings)
['08\n', '02\n', '09\n', '04\n', '05\n', '06\n', '07\n', '03\n', '06\n', '01\n', '10\n']
If you want the output_list
to be sorted, just sort the txt_files
beforehand or just sort the output_list
itself. Both works:
如果你想output_list
排序,只需txt_files
预先排序或只是排序output_list
本身。两个作品:
output_strings = map(read_first_line, sorted(txt_files))
output_strings = sorted(map(read_first_line, txt_files))
output_strings = map(read_first_line, sorted(txt_files))
output_strings = sorted(map(read_first_line, txt_files))
Concatenate the output strings and write them to an output file
连接输出字符串并将它们写入输出文件
So now you have a list of output strings, and the last thing you want to do, is combine them:
所以现在你有一个输出字符串列表,你要做的最后一件事是将它们组合起来:
>>> output_content = "".join(sorted(output_strings)) # sort join the output strings without separators
>>> output_content # as a string
'01\n02\n03\n04\n05\n06\n07\n08\n09\n10\n'
>>> print(output_content) # print as formatted
01
02
03
04
05
06
07
08
09
10
Now it's just a matter of writing this giant string to an output file! Let's call it outfile.txt
:
现在只需将这个巨大的字符串写入输出文件即可!让我们称之为outfile.txt
:
>>> with open('outfile.txt', 'wt') as fd:
... fd.write(output_content)
...
Then you're done! You're all set! Let's confirm it:
然后你就完成了!你都准备好了!让我们确认一下:
>>> with open('outfile.txt', 'rt') as fd:
... print fd.readlines()
...
['01\n', '02\n', '03\n', '04\n', '05\n', '06\n', '07\n', '08\n', '09\n', '10\n']
All of the above in a function
以上所有在一个函数中
I'll be using the glob
module so that it will always know what directory I will be accessing my paths from without the hassle of using absolute paths with the os
module and whatnot.
我将使用该glob
模块,以便它始终知道我将从哪个目录访问我的路径,而无需在os
模块中使用绝对路径等等。
import glob
def read_first_line(file):
"""Gets the first line from a file.
Returns
-------
str
the first line text of the input file
"""
with open(file, 'rt') as fd:
first_line = fd.readline()
return first_line
def merge_per_folder(folder_path, output_filename):
"""Merges first lines of text files in one folder, and
writes combined lines into new output file
Parameters
----------
folder_path : str
String representation of the folder path containing the text files.
output_filename : str
Name of the output file the merged lines will be written to.
"""
# make sure there's a slash to the folder path
folder_path += "" if folder_path[-1] == "/" else "/"
# get all text files
txt_files = glob.glob(folder_path + "*.txt")
# get first lines; map to each text file (sorted)
output_strings = map(read_first_line, sorted(txt_files))
output_content = "".join(output_strings)
# write to file
with open(folder_path + output_filename, 'wt') as outfile:
outfile.write(output_content)
回答by Jaiprasad
Lets assume you have files in the folder path
path = /home/username/foldername/
假设您在文件夹路径中有文件
path = /home/username/foldername/
so you have all the files in the path folder, to read all the files in the folder you should use os
or `glob' to do that.
所以你有路径文件夹中的所有文件,要读取你应该使用的文件夹中的所有文件os
或“glob”来执行此操作。
import os
path = "/home/username/foldername/"
savepath = "/home/username/newfolder/"
for dir,subdir,files in os.walk(path):
infile = open(path+files)
outfile = open(savepath,'w')
a = infile.readline().split('.')
for k in range (0,len(a)):
print(a[0], file=outfile, end='')
infile.close()
outfile.close
print "done"
or using glob you can do it much lesser lines of code.
或者使用 glob 你可以做更少的代码行。
import glob
path = "/home/username/foldername/"
savepath = "/home/username/newfolder/"
for files in glob.glob(path +"*.txt"):
infile = open(files)
outfile = open(savepath,'w')
a = infile.readline().split('.')
for k in range (0,len(a)):
print(a[0], file=outfile, end='')
infile.close()
outfile.close
print "done"
hope it might work for you.
希望它可能对你有用。
回答by Sarcoma
Thanks to Eddo Hintoso for his detailed answer, I've slightly tweaked it to use yield
rather than return
so it doesn't need to be mapped. I'm posting it here in case it is useful to anyone else who finds this post.
感谢 Eddo Hintoso 的详细回答,我稍微调整了它以使用yield
而不是return
因此不需要映射。我把它贴在这里,以防它对发现这篇文章的其他人有用。
import glob
files = glob.glob("data/*.txt")
def map_first_lines(file_list):
for file in file_list:
with open(file, 'r') as fd:
yield fd.readline()
[print(f) for f in map_first_lines(files)]
So another way to solve this particular problem:
所以解决这个特定问题的另一种方法:
import glob
def map_first_lines(file_list):
for file in file_list:
with open(file, 'rt') as fd:
yield fd.readline()
def merge_first_lines(file_list, filename='first_lines.txt'):
with open(filename, 'w') as f:
for line in map_first_lines(file_list):
f.write("%s\n" % line)
files = glob.glob("data/*.txt")
merge_first_lines(files)