Python 如何使用 glob.glob 模块搜索子文件夹?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14798220/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I search sub-folders using glob.glob module?
提问by UserYmY
I want to open a series of subfolders in a folder and find some text files and print some lines of the text files. I am using this:
我想在一个文件夹中打开一系列子文件夹并找到一些文本文件并打印一些文本文件的行。我正在使用这个:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/*.txt')
But this cannot access the subfolders as well. Does anyone know how I can use the same command to access subfolders as well?
但这也不能访问子文件夹。有谁知道我如何使用相同的命令来访问子文件夹?
采纳答案by Martijn Pieters
In Python 3.5 and newer use the new recursive **/functionality:
在 Python 3.5 和更新版本中使用新的递归**/功能:
configfiles = glob.glob('C:/Users/sam/Desktop/file1/**/*.txt', recursive=True)
When recursiveis set, **followed by a path separator matches 0 or more subdirectories.
当recursive被设置时,**随后是路径分隔匹配0或多个子目录。
In earlier Python versions, glob.glob()cannot list files in subdirectories recursively.
在早期的 Python 版本中,glob.glob()不能递归列出子目录中的文件。
In that case I'd use os.walk()combined with fnmatch.filter()instead:
在这种情况下,我会使用os.walk()结合使用fnmatch.filter():
import os
import fnmatch
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in fnmatch.filter(files, '*.txt')]
This'll walk your directories recursively and return all absolute pathnames to matching .txtfiles. In this specificcase the fnmatch.filter()may be overkill, you could also use a .endswith()test:
这将递归地遍历您的目录并将所有绝对路径名返回到匹配的.txt文件。在这种特定情况下,这fnmatch.filter()可能是矫枉过正,您还可以使用.endswith()测试:
import os
path = 'C:/Users/sam/Desktop/file1'
configfiles = [os.path.join(dirpath, f)
for dirpath, dirnames, files in os.walk(path)
for f in files if f.endswith('.txt')]
回答by jfs
To find files in immediate subdirectories:
要在直接子目录中查找文件:
configfiles = glob.glob(r'C:\Users\sam\Desktop\*\*.txt')
For a recursive version that traverse all subdirectories, you could use **and pass recursive=Truesince Python 3.5:
对于遍历所有子目录的递归版本,您可以使用**并传递recursive=True自 Python 3.5:
configfiles = glob.glob(r'C:\Users\sam\Desktop\**\*.txt', recursive=True)
Both function calls return lists. You could use glob.iglob()to return paths one by one. Or use pathlib:
两个函数调用返回列表。您可以使用glob.iglob()一一返回路径。或使用pathlib:
from pathlib import Path
path = Path(r'C:\Users\sam\Desktop')
txt_files_only_subdirs = path.glob('*/*.txt')
txt_files_all_recursively = path.rglob('*.txt') # including the current dir
Both methods return iterators (you can get paths one by one).
两种方法都返回迭代器(可以一一获取路径)。
回答by Andrew Alcock
回答by megawac
The glob2package supports wild cards and is reasonably fast
该glob2包支持通配符和相当快
code = '''
import glob2
glob2.glob("files/*/**")
'''
timeit.timeit(code, number=1)
On my laptop it takes approximately 2 seconds to match >60,000 file paths.
在我的笔记本电脑上,匹配>60,000 个文件路径大约需要 2 秒。
回答by cevaris
Here is a adapted version that enables glob.globlike functionality without using glob2.
这是一个改编版本,glob.glob无需使用glob2.
def find_files(directory, pattern='*'):
if not os.path.exists(directory):
raise ValueError("Directory not found {}".format(directory))
matches = []
for root, dirnames, filenames in os.walk(directory):
for filename in filenames:
full_path = os.path.join(root, filename)
if fnmatch.filter([full_path], pattern):
matches.append(os.path.join(root, filename))
return matches
So if you have the following dir structure
所以如果你有以下目录结构
tests/files
├── a0
│?? ├── a0.txt
│?? ├── a0.yaml
│?? └── b0
│?? ├── b0.yaml
│?? └── b00.yaml
└── a1
You can do something like this
你可以做这样的事情
files = utils.find_files('tests/files','**/b0/b*.yaml')
> ['tests/files/a0/b0/b0.yaml', 'tests/files/a0/b0/b00.yaml']
Pretty much fnmatchpattern match on the whole filename itself, rather than the filename only.
几乎fnmatch整个文件名本身的模式匹配,而不仅仅是文件名。
回答by f0xdx
As pointed out by Martijn, glob can only do this through the **operator introduced in Python 3.5. Since the OP explicitly asked for the glob module, the following will return a lazy evaluation iterator that behaves similarly
正如 Martijn 所指出的,glob 只能通过**Python 3.5 中引入的运算符来做到这一点。由于 OP 明确要求使用 glob 模块,因此以下将返回一个行为类似的惰性求值迭代器
import os, glob, itertools
configfiles = itertools.chain.from_iterable(glob.iglob(os.path.join(root,'*.txt'))
for root, dirs, files in os.walk('C:/Users/sam/Desktop/file1/'))
Note that you can only iterate once over configfilesin this approach though. If you require a real list of configfiles that can be used in multiple operations you would have to create this explicitly by using list(configfiles).
请注意,您只能configfiles在这种方法中迭代一次。如果您需要可用于多个操作的真实配置文件列表,则必须使用list(configfiles).
回答by NILESH KUMAR
configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")
configfiles = glob.glob('C:/Users/sam/Desktop/**/*.txt")
Doesn't works for all cases, instead use glob2
不适用于所有情况,而是使用 glob2
configfiles = glob2.glob('C:/Users/sam/Desktop/**/*.txt")
回答by dreab
If you can install glob2 package...
如果您可以安装 glob2 包...
import glob2
filenames = glob2.glob("C:\top_directory\**\*.ext") # Where ext is a specific file extension
folders = glob2.glob("C:\top_directory\**\")
All filenames and folders:
所有文件名和文件夹:
all_ff = glob2.glob("C:\top_directory\**\**")
回答by Eugene Yarmash
If you're running Python 3.4+, you can use the pathlibmodule. The Path.glob()method supports the **pattern, which means “this directory and all subdirectories, recursively”. It returns a generator yielding Pathobjects for all matching files.
如果您运行的是 Python 3.4+,则可以使用该pathlib模块。该Path.glob()方法支持**模式,这意味着“递归地将此目录和所有子目录”。它返回一个生成器,Path为所有匹配的文件生成对象。
from pathlib import Path
configfiles = Path("C:/Users/sam/Desktop/file1/").glob("**/*.txt")
回答by germ
There's a lot of confusion on this topic. Let me see if I can clarify it (Python 3.7):
关于这个话题有很多困惑。让我看看我是否可以澄清它(Python 3.7):
glob.glob('*.txt') :matches all files ending in '.txt' in current directoryglob.glob('*/*.txt') :same as 1glob.glob('**/*.txt') :matches all files ending in '.txt' in the immediate subdirectories only, but not in the current directoryglob.glob('*.txt',recursive=True) :same as 1glob.glob('*/*.txt',recursive=True) :same as 3glob.glob('**/*.txt',recursive=True):matches all files ending in '.txt' in the current directory and in all subdirectories
glob.glob('*.txt') :匹配当前目录中所有以“.txt”结尾的文件glob.glob('*/*.txt') :同 1glob.glob('**/*.txt') :仅匹配直接子目录中以 '.txt' 结尾的所有文件,而不匹配当前目录中的所有文件glob.glob('*.txt',recursive=True) :同 1glob.glob('*/*.txt',recursive=True) :同 3glob.glob('**/*.txt',recursive=True):匹配当前目录和所有子目录中以“.txt”结尾的所有文件
So it's best to always specify recursive=True.
所以最好总是指定 recursive=True.

