如何在python中读取文件夹中的txt文件列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35672809/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:47:27  来源:igfitidea点击:

how to read a list of txt files in a folder in python

python

提问by hhs

I am new to python, I wrote an algorithm to read 10 txt files in a folder and then write the first line of each of them in one txt outfile. but it doesn't work. I mean after I run it, I will neither face any error nor get the outfile.

我是 Python 新手,我编写了一个算法来读取文件夹中的 10 个 txt 文件,然后将每个文件的第一行写入一个 txt 输出文件中。但它不起作用。我的意思是在我运行它之后,我既不会遇到任何错误,也不会得到输出文件。

def MergePerFolder(path):
    path1=listdir_fullpath(path)
    for i in path1:
        infile=open(i)
        outfile=open('F:// merge1.txt', 'w')
        a=infile.readline().split('.')
        for k in range (len(a)):
            print(a[0], file=outfile, end='')

    infile.close()
    outfile.close
    print("done")

回答by hb X

in this example may be you should close the outfile in loop because it is trying to open many times without closing previous one

在这个例子中,你可能应该关闭循环中的输出文件,因为它试图打开很多次而不关闭前一个

回答by Eddo Hintoso

NOTE: I do write the function(s) at the end of my answer, so feel free to jump to that - but I still wanted to run through the code part by part for the sake of better understanding.

注意:我确实在答案的末尾写了函数,所以可以随意跳到那个 - 但为了更好地理解,我仍然想部分地运行代码。



Example scenario that will be used for explanation

将用于解释的示例场景

Say you have 12 files in this folder called test, 10 of which are .txtfiles:

假设您在此文件夹中有 12 个文件,名为test,其中 10 个是.txt文件:

.../
    test/
        01.txt
        02.txt
        03.txt
        04.txt
        05.txt
        06.txt
        07.txt
        08.txt
        09.txt
        10.txt
        random_file.py
        this_shouldnt_be_here.sh

With each .txtfile having their first line as their corresponding number, like

每个.txt文件的第一行都是对应的编号,例如

  • 01.txtcontains the first line 01,
  • 02.txtcontains the first line 02,
  • etc...
  • 01.txt包含第一行01
  • 02.txt包含第一行02
  • 等等...


List all text files in the designated directory

列出指定目录下的所有文本文件

You can do this in two ways:

您可以通过两种方式执行此操作:

Method 1: osmodule

方法一:os模块

You can import the module osand use the method listdirto list all the files in that directory. It is important to note that all files in the list will be relative filenames:

您可以导入模块os并使用该方法listdir列出该目录中的所有文件。需要注意的是,列表中的所有文件都是相对文件名:

>>> import os             
>>> all_files = os.listdir("test/")   # imagine you're one directory above test dir
>>> print(all_files)  # won't necessarily be sorted
['08.txt', '02.txt', '09.txt', '04.txt', '05.txt', '06.txt', '07.txt', '03.txt', '06.txt', '01.txt', 'this_shouldnt_be_here.sh', '10.txt', 'random_file.py']

Now, you only want the .txtfiles, so with a bit of functional programming using the filterfunction and anonymous functions, you can easily filter them out without using standard forloops:

现在,您只需要.txt文件,因此通过使用filter函数和匿名函数的一些函数式编程,您可以轻松地将它们过滤掉,而无需使用标准for循环:

>>> txt_files = filter(lambda x: x[-4:] == '.txt', all_files)
>>> print(txt_files)  # only text files
['08.txt', '02.txt', '09.txt', '04.txt', '05.txt', '06.txt', '07.txt', '03.txt', '06.txt', '01.txt', '10.txt']

Method 2: globmodule

方法二:glob模块

Similarly, you can use the globmodule and use the glob.globfunction to list all text files in the directory without using any functional programming above! The only difference is that globwill output a list with prefix paths, however you inputted it.

同样,您可以使用glob模块和glob.glob函数来列出目录中的所有文本文件,而无需使用上述任何函数式编程!唯一的区别是glob会输出一个带有前缀路径的列表,但是你输入了它。

>>> import glob
>>> txt_files = glob.glob("test/*.txt")
['test/08.txt', 'test/02.txt', 'test/09.txt', 'test/04.txt', 'test/05.txt', 'test/06.txt', 'test/07.txt', 'test/03.txt', 'test/06.txt', 'test/01.txt', 'test/10.txt']

What I mean by globoutputting the list by however you input the relative or full path - for example, if you were in the testdirectory and you called glob.glob('./*.txt'), you would get a list like:

我的意思是通过glob输入相对路径或完整路径来输出列表 - 例如,如果您在test目录中并调用glob.glob('./*.txt'),您将获得如下列表:

>>> glob.glob('./*.txt')
['./08.txt', './02.txt', './09.txt', ... ]

By the way, ./means in the same directory. Alternatively, you can just not prepend the ./- but the string representations will accordingly change:

顺便说一下,./意味着在同一目录中。或者,您可以不添加./- 但字符串表示会相应地改变:

>>> glob.glob("*.txt")  # already in directory containing the text files
['08.txt', '02.txt', '09.txt', ... ]


Doing something with a file using file context managers

使用文件上下文管理器对文件进行处理

Alright, now the problem with your code is that you are opening these connections to all these files without closing them. Generally, the procedure to do something with a file in python is this:

好的,现在您的代码的问题是您打开了所有这些文件的连接,而没有关闭它们。通常,在 python 中处理文件的过程是这样的:

fd = open(filename, mode)
fd.method  # could be write(), read(), readline(), etc...
fd.close()

Now, the problem with this is that if something goes wrong in the second line where you call a method on the file, the file will never close and you're in big trouble.

现在,问题在于,如果在第二行调用文件的方法时出现问题,文件将永远不会关闭,您就会遇到大麻烦。

To prevent this, we use what we call file context managerin Python using the withkeyword. This ensures the file will close with or without failures.

为了防止这种情况,我们使用关键字在 Python 中使用我们称之为文件上下文管理器的东西with。这可以确保文件在失败或没有失败的情况下关闭。

with open(filename, mode) as fd:
    fd.method


Reading the first line of a file with readline()

读取文件的第一行 readline()

As you probably know already, to extract the first line of a file, you simply have to open it and call the readline()method. We want to do this with all the text files listed in txt_files, but yes - you can do this with functional programming mapfunction, except this time we won't be writing an anonymous function (for readability):

您可能已经知道,要提取文件的第一行,您只需打开它并调用该readline()方法。我们想对 中列出的所有文本文件执行此操作txt_files,但是是的 - 您可以使用函数式编程map函数执行此操作,但这次我们不会编写匿名函数(为了可读性):

>>> def read_first_line(file):
...     with open(file, 'rt') as fd:
...         first_line = fd.readline()
...     return first_line
...
>>> output_strings = map(read_first_line, txt_files)  # apply read first line function all text files
>>> print(output_strings)
['08\n', '02\n', '09\n', '04\n', '05\n', '06\n', '07\n', '03\n', '06\n', '01\n', '10\n']

If you want the output_listto be sorted, just sort the txt_filesbeforehand or just sort the output_listitself. Both works:

如果你想output_list排序,只需txt_files预先排序或只是排序output_list本身。两个作品:

  • output_strings = map(read_first_line, sorted(txt_files))
  • output_strings = sorted(map(read_first_line, txt_files))
  • output_strings = map(read_first_line, sorted(txt_files))
  • output_strings = sorted(map(read_first_line, txt_files))


Concatenate the output strings and write them to an output file

连接输出字符串并将它们写入输出文件

So now you have a list of output strings, and the last thing you want to do, is combine them:

所以现在你有一个输出字符串列表,你要做的最后一件事是将它们组合起来:

>>> output_content = "".join(sorted(output_strings))  # sort join the output strings without separators
>>> output_content  # as a string
'01\n02\n03\n04\n05\n06\n07\n08\n09\n10\n'
>>> print(output_content)  # print as formatted
01
02
03
04
05
06
07
08
09
10

Now it's just a matter of writing this giant string to an output file! Let's call it outfile.txt:

现在只需将这个巨大的字符串写入输出文件即可!让我们称之为outfile.txt

>>> with open('outfile.txt', 'wt') as fd:
...    fd.write(output_content)
...

Then you're done! You're all set! Let's confirm it:

然后你就完成了!你都准备好了!让我们确认一下:

>>> with open('outfile.txt', 'rt') as fd:
...    print fd.readlines()
...
['01\n', '02\n', '03\n', '04\n', '05\n', '06\n', '07\n', '08\n', '09\n', '10\n']


All of the above in a function

以上所有在一个函数中

I'll be using the globmodule so that it will always know what directory I will be accessing my paths from without the hassle of using absolute paths with the osmodule and whatnot.

我将使用该glob模块,以便它始终知道我将从哪个目录访问我的路径,而无需在os模块中使用绝对路径等等。

import glob

def read_first_line(file):
    """Gets the first line from a file.

    Returns
    -------
    str
        the first line text of the input file
    """
    with open(file, 'rt') as fd:
        first_line = fd.readline()
    return first_line

def merge_per_folder(folder_path, output_filename):
    """Merges first lines of text files in one folder, and
    writes combined lines into new output file

    Parameters
    ----------
    folder_path : str
        String representation of the folder path containing the text files.
    output_filename : str
        Name of the output file the merged lines will be written to.
    """
    # make sure there's a slash to the folder path 
    folder_path += "" if folder_path[-1] == "/" else "/"
    # get all text files
    txt_files = glob.glob(folder_path + "*.txt")
    # get first lines; map to each text file (sorted)
    output_strings = map(read_first_line, sorted(txt_files))
    output_content = "".join(output_strings)
    # write to file
    with open(folder_path + output_filename, 'wt') as outfile:
        outfile.write(output_content)

回答by Jaiprasad

Lets assume you have files in the folder path path = /home/username/foldername/

假设您在文件夹路径中有文件 path = /home/username/foldername/

so you have all the files in the path folder, to read all the files in the folder you should use osor `glob' to do that.

所以你有路径文件夹中的所有文件,要读取你应该使用的文件夹中的所有文件os或“glob”来执行此操作。

import os
path = "/home/username/foldername/"
savepath = "/home/username/newfolder/" 
for dir,subdir,files in os.walk(path):
    infile = open(path+files)
    outfile = open(savepath,'w')
    a = infile.readline().split('.')
    for k in range (0,len(a)):
        print(a[0], file=outfile, end='')
infile.close()
outfile.close
print "done"

or using glob you can do it much lesser lines of code.

或者使用 glob 你可以做更少的代码行。

import glob
path = "/home/username/foldername/"
savepath = "/home/username/newfolder/"
for files in glob.glob(path +"*.txt"):
    infile = open(files)
    outfile = open(savepath,'w')
    a = infile.readline().split('.')
    for k in range (0,len(a)):
        print(a[0], file=outfile, end='')
infile.close()
outfile.close
print "done" 

hope it might work for you.

希望它可能对你有用。

回答by Sarcoma

Thanks to Eddo Hintoso for his detailed answer, I've slightly tweaked it to use yieldrather than returnso it doesn't need to be mapped. I'm posting it here in case it is useful to anyone else who finds this post.

感谢 Eddo Hintoso 的详细回答,我稍微调整了它以使用yield而不是return因此不需要映射。我把它贴在这里,以防它对发现这篇文章的其他人有用。

import glob

files = glob.glob("data/*.txt")


def map_first_lines(file_list):
    for file in file_list:
        with open(file, 'r') as fd:
            yield fd.readline()


[print(f) for f in map_first_lines(files)]

So another way to solve this particular problem:

所以解决这个特定问题的另一种方法:

import glob


def map_first_lines(file_list):
    for file in file_list:
        with open(file, 'rt') as fd:
            yield fd.readline()


def merge_first_lines(file_list, filename='first_lines.txt'):
    with open(filename, 'w') as f:
        for line in map_first_lines(file_list):
            f.write("%s\n" % line)


files = glob.glob("data/*.txt")

merge_first_lines(files)