Python 如何使用 glob.glob 模块搜索子文件夹?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14798220/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:28:36  来源:igfitidea点击:

How can I search sub-folders using glob.glob module?

pythonfilesystemsglobfnmatch

提问by UserYmY

I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:

我想在一个文件夹中打开一系列子文件夹并找到一些文本文件并打印一些文本文件的行。我正在使用这个:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')

But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?

但这也不能访问子文件夹。有谁知道我如何使用相同的命令来访问子文件夹?

采纳答案by Martijn Pieters

In Python 3.5 and newer use the new recursive **/functionality:

在 Python 3.5 和更新版本中使用新的递归**/功能:

configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)

When recursiveis set, **followed by a path separator matches 0 or more subdirectories.

recursive被设置时,**随后是路径分隔匹配0或多个子目录。

In earlier Python versions, glob.glob()cannot list files in subdirectories recursively.

在早期的 Python 版本中,glob.glob()不能递归列出子目录中的文件。

In that case I'd use os.walk()combined with fnmatch.filter()instead:

在这种情况下,我会使用os.walk()结合使用fnmatch.filter()

import os
import fnmatch

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in fnmatch.filter(files, '*.txt')]

This'll walk your directories recursively and return all absolute pathnames to matching .txtfiles. In this specificcase the fnmatch.filter()may be overkill, you could also use a .endswith()test:

这将递归地遍历您的目录并将所有绝对路径名返回到匹配的.txt文件。在这种特定情况下,这fnmatch.filter()可能是矫枉过正,您还可以使用.endswith()测试:

import os

path = 'C:/Users/sam/Desktop/file1'

configfiles = [os.path.join(dirpath, f)
    for dirpath, dirnames, files in os.walk(path)
    for f in files if f.endswith('.txt')]

回答by jfs

To find files in immediate subdirectories:

要在直接子目录中查找文件:

configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')

For a recursive version that traverse all subdirectories, you could use **and pass recursive=Truesince Python 3.5:

对于遍历所有子目录的递归版本,您可以使用**并传递recursive=True自 Python 3.5

configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)

Both function calls return lists. You could use glob.iglob()to return paths one by one. Or use pathlib:

两个函数调用返回列表。您可以使用glob.iglob()一一返回路径。或使用pathlib

from pathlib import Path

path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir

Both methods return iterators (you can get paths one by one).

两种方法都返回迭代器(可以一一获取路径)。

回答by Andrew Alcock

You can use Formicwith Python 2.6

您可以在 Python 2.6 中使用Formic

import formic
fileset = formic.FileSet(include="**/*.txt", directory="C:/Users/sam/Desktop/")

Disclosure - I am the author of this package.

披露 - 我是这个包的作者。

回答by megawac

The glob2package supports wild cards and is reasonably fast

glob2包支持通配符和相当快

code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)

On my laptop it takes approximately 2 seconds to match >60,000 file paths.

在我的笔记本电脑上,匹配>60,000 个文件路径大约需要 2 秒。

回答by cevaris

Here is a adapted version that enables glob.globlike functionality without using glob2.

这是一个改编版本,glob.glob无需使用glob2.

def find_files(directory, pattern='*'):
    if not os.path.exists(directory):
        raise ValueError("Directory not found {}".format(directory))

    matches = []
    for root, dirnames, filenames in os.walk(directory):
        for filename in filenames:
            full_path = os.path.join(root, filename)
            if fnmatch.filter([full_path], pattern):
                matches.append(os.path.join(root, filename))
    return matches

So if you have the following dir structure

所以如果你有以下目录结构

tests/files
├── a0
│?? ├── a0.txt
│?? ├── a0.yaml
│?? └── b0
│??     ├── b0.yaml
│??     └── b00.yaml
└── a1

You can do something like this

你可以做这样的事情

files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']

Pretty much fnmatchpattern match on the whole filename itself, rather than the filename only.

几乎fnmatch整个文件名本身的模式匹配,而不仅仅是文件名。

回答by f0xdx

As pointed out by Martijn, glob can only do this through the **operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly

正如 Martijn 所指出的,glob 只能通过**Python 3.5 中引入的运算符来做到这一点。由于 OP 明确要求使用 glob 模块,因此以下将返回一个行为类似的惰性求值迭代器

import os, glob, itertools

configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
                         for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))

Note that you can only iterate once over configfilesin this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles).

请注意,您只能configfiles在这种方法中迭代一次。如果您需要可用于多个操作的真实配置文件列表,则必须使用list(configfiles).

回答by NILESH KUMAR

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")

Doesn't works for all cases, instead use glob2

不适用于所有情况,而是使用 glob2

configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")

回答by dreab

If you can install glob2 package...

如果您可以安装 glob2 包...

import glob2
filenames = glob2.glob("C:\top_directory\**\*.ext")  # Where ext is a specific file extension
folders = glob2.glob("C:\top_directory\**\")

All filenames and folders:

所有文件名和文件夹:

all_ff = glob2.glob("C:\top_directory\**\**")  

回答by Eugene Yarmash

If you're running Python 3.4+, you can use the pathlibmodule. The Path.glob()method supports the **pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Pathobjects for all matching files.

如果您运行的是 Python 3.4+,则可以使用该pathlib模块。该Path.glob()方法支持**模式,这意味着“递归地将此目录和所有子目录”。它返回一个生成器,Path为所有匹配的文件生成对象。

from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")

回答by germ

There's a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):

关于这个话题有很多困惑。让我看看我是否可以澄清它(Python 3.7):

  1. glob.glob('*.txt') :matches all files ending in '.txt' in current directory
  2. glob.glob('*/*.txt') :same as 1
  3. glob.glob('**/*.txt') :matches all files ending in '.txt' in the immediate subdirectories only, but not in the current directory
  4. glob.glob('*.txt',recursive=True) :same as 1
  5. glob.glob('*/*.txt',recursive=True) :same as 3
  6. glob.glob('**/*.txt',recursive=True):matches all files ending in '.txt' in the current directory and in all subdirectories
  1. glob.glob('*.txt') :匹配当前目录中所有以“.txt”结尾的文件
  2. glob.glob('*/*.txt') :同 1
  3. glob.glob('**/*.txt') :仅匹配直接子目录中以 '.txt' 结尾的所有文件,而不匹配当前目录中的所有文件
  4. glob.glob('*.txt',recursive=True) :同 1
  5. glob.glob('*/*.txt',recursive=True) :同 3
  6. glob.glob('**/*.txt',recursive=True):匹配当前目录和所有子目录中以“.txt”结尾的所有文件

So it's best to always specify recursive=True.

所以最好总是指定 recursive=True.