Python 如何遍历目录中的文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4918458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 18:09:29  来源:igfitidea点击:

How to traverse through the files in a directory?

python

提问by Bruce

I have a directory logfiles. I want to process each file inside this directory using a Python script.

我有一个目录日志文件。我想使用 Python 脚本处理此目录中的每个文件。

for file in directory:
      # do something

How do I do this?

我该怎么做呢?

采纳答案by Ignacio Vazquez-Abrams

With os.listdir()or os.walk(), depending on whether you want to do it recursively.

使用os.listdir()os.walk(),取决于您是否要递归执行。

回答by luc

In Python 2, you can try something like:

在 Python 2 中,您可以尝试以下操作:

import os.path

def print_it(x, dir_name, files):
    print dir_name
    print files

os.path.walk(your_dir, print_it, 0)

Note: the 3rd argument of os.path.walk is whatever you want. You'll get it as the 1st arg of the callback.

注意: os.path.walk 的第三个参数是你想要的。你会得到它作为回调的第一个参数。

In Python 3 os.path.walkhas been removed; use os.walkinstead. Instead of taking a callback, you just pass it a directory and it yields (dirpath, dirnames, filenames)triples. So a rough equivalent of the above becomes

在 Python 3 中os.path.walk已被删除;使用os.walk来代替。您只需将一个目录传递给它,而不是接受回调,它就会产生(dirpath, dirnames, filenames)三元组。所以上面的粗略等价物变成

import os

for dirpath, dirnames, filenames in os.walk(your_dir):
    print dirpath
    print dirnames
    print filenames

回答by Blender

You could try glob:

你可以试试glob

import glob

for file in glob.glob('log-*-*.txt'):
  # Etc.

But globdoesn't work recursively (as far as I know), so if your logs are in folders inside of that directory, you'd be better off looking at what Ignacio Vazquez-Abramsposted.

glob不能递归地工作(据我所知),因此如果您的日志位于该目录内的文件夹中,您最好查看Ignacio Vazquez-Abrams发布的内容。

回答by pyth

import os
# location of directory you want to scan
loc = '/home/sahil/Documents'
# global dictonary element used to store all results
global k1 
k1 = {}

# scan function recursively scans through all the diretories in loc and return a dictonary
def scan(element,loc):

    le = len(element)

    for i in range(le):   
        try:

            second_list = os.listdir(loc+'/'+element[i])
            temp = loc+'/'+element[i]
            print "....."
            print "Directory %s " %(temp)
            print " "
            print second_list
            k1[temp] = second_list
            scan(second_list,temp)

        except OSError:
            pass

    return k1 # return the dictonary element    


# initial steps
try:
    initial_list = os.listdir(loc)
    print initial_list
except OSError:
    print "error"


k =scan(initial_list,loc)
print " ..................................................................................."
print k

I made this code as a directory scanner to make a playlist feature for my audio player and it will recursively scan all the sub directories present in directory.

我将此代码用作目录扫描器,为我的音频播放器制作播放列表功能,它将递归扫描目录中存在的所有子目录。

回答by ATOzTOA

If you need to check for multiple file types, use

如果您需要检查多种文件类型,请使用

glob.glob("*.jpg") + glob.glob("*.png")

Glob doesn't care about the ordering of the files in the list. If you need files sorted by filename, use

Glob 不关心列表中文件的顺序。如果您需要按文件名排序的文件,请使用

sorted(glob.glob("*.jpg"))

回答by Matheus Araujo

You can list every file from a directory recursively like this.

您可以像这样递归地列出目录中的每个文件。

from os import listdir
from os.path import isfile, join, isdir

def getAllFilesRecursive(root):
    files = [ join(root,f) for f in listdir(root) if isfile(join(root,f))]
    dirs = [ d for d in listdir(root) if isdir(join(root,d))]
    for d in dirs:
        files_in_d = getAllFilesRecursive(join(root,d))
        if files_in_d:
            for f in files_in_d:
                files.append(join(root,f))
    return files

回答by aqua

import os
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)
    # Remove the first entry in the list of sub-directories
    # if there are any sub-directories present
    if len(subdirList) > 0:
        del subdirList[0]

回答by Orsiris de Jong

Here's my version of the recursive file walker based on the answer of Matheus Araujo, that can take optional exclusion list arguments, which happens to be very helpful when dealing with tree copies where some directores / files / file extensions aren't wanted.

这是我的基于 Matheus Araujo 的答案的递归文件遍历器版本,它可以采用可选的排除列表参数,这在处理不需要某些目录/文件/文件扩展名的树副本时非常有用。

import os

def get_files_recursive(root, d_exclude_list=[], f_exclude_list=[], ext_exclude_list=[], primary_root=None):
"""
Walk a path to recursively find files
Modified version of https://stackoverflow.com/a/24771959/2635443 that includes exclusion lists
:param root: path to explore
:param d_exclude_list: list of root relative directories paths to exclude
:param f_exclude_list: list of filenames without paths to exclude
:param ext_exclude_list: list of file extensions to exclude, ex: ['.log', '.bak']
:param primary_root: Only used for internal recursive exclusion lookup, don't pass an argument here
:return: list of files found in path
"""

# Make sure we use a valid os separator for exclusion lists, this is done recursively :(
d_exclude_list = [os.path.normpath(d) for d in d_exclude_list]

files = [os.path.join(root, f) for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))
         and f not in f_exclude_list and os.path.splitext(f)[1] not in ext_exclude_list]
dirs = [d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d))]
for d in dirs:
    p_root = os.path.join(primary_root, d) if primary_root is not None else d
    if p_root not in d_exclude_list:
        files_in_d = get_files_recursive(os.path.join(root, d), d_exclude_list, f_exclude_list, ext_exclude_list, primary_root=p_root)
        if files_in_d:
            for f in files_in_d:
                files.append(os.path.join(root, f))
return files

回答by Orsiris de Jong

This is an update of my last version that accepts glob style wildcards in exclude lists. The function basically walks into every subdirectory of the given path and returns the list of all files from those directories, as relative paths. Function works like Matheus' answer, and may use optional exclude lists.

这是我上一个版本的更新,它接受排除列表中的 glob 样式通配符。该函数基本上进入给定路径的每个子目录,并返回这些目录中所有文件的列表,作为相对路径。函数的工作方式类似于 Matheus 的答案,并且可以使用可选的排除列表。

Eg:

例如:

files = get_files_recursive('/some/path')
files = get_files_recursive('/some/path', f_exclude_list=['.cache', '*.bak'])
files = get_files_recursive('C:\Users', d_exclude_list=['AppData', 'Temp'])
files = get_files_recursive('/some/path', ext_exclude_list=['.log', '.db'])

Hope this helps someone like the initial answer of this thread helped me :)

希望这可以帮助像这个线程的初始答案这样的人帮助我:)

import os
from fnmatch import fnmatch

def glob_path_match(path, pattern_list):
    """
    Checks if path is in a list of glob style wildcard paths
    :param path: path of file / directory
    :param pattern_list: list of wildcard patterns to check for
    :return: Boolean
    """
    return any(fnmatch(path, pattern) for pattern in pattern_list)


def get_files_recursive(root, d_exclude_list=None, f_exclude_list=None, ext_exclude_list=None, primary_root=None):
    """
    Walk a path to recursively find files
    Modified version of https://stackoverflow.com/a/24771959/2635443 that includes exclusion lists
    and accepts glob style wildcards on files and directories
    :param root: path to explore
    :param d_exclude_list: list of root relative directories paths to exclude
    :param f_exclude_list: list of filenames without paths to exclude
    :param ext_exclude_list: list of file extensions to exclude, ex: ['.log', '.bak']
    :param primary_root: Only used for internal recursive exclusion lookup, don't pass an argument here
    :return: list of files found in path
    """

    if d_exclude_list is not None:
        # Make sure we use a valid os separator for exclusion lists, this is done recursively :(
        d_exclude_list = [os.path.normpath(d) for d in d_exclude_list]
    else:
        d_exclude_list = []
    if f_exclude_list is None:
        f_exclude_list = []
    if ext_exclude_list is None:
        ext_exclude_list = []

    files = [os.path.join(root, f) for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))
             and not glob_path_match(f, f_exclude_list) and os.path.splitext(f)[1] not in ext_exclude_list]
    dirs = [d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d))]
    for d in dirs:
        p_root = os.path.join(primary_root, d) if primary_root is not None else d
        if not glob_path_match(p_root, d_exclude_list):
            files_in_d = get_files_recursive(os.path.join(root, d), d_exclude_list, f_exclude_list, ext_exclude_list,
                                             primary_root=p_root)
            if files_in_d:
                for f in files_in_d:
                    files.append(os.path.join(root, f))
    return files