Python 排除 os.walk 中的目录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19859840/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:49:24  来源:igfitidea点击:

Excluding directories in os.walk

python

提问by antred

I'm writing a script that descends into a directory tree (using os.walk()) and then visits each file matching a certain file extension. However, since some of the directory trees that my tool will be used on also contain sub directories that in turn contain a LOTof useless (for the purpose of this script) stuff, I figured I'd add an option for the user to specify a list of directories to exclude from the traversal.

我正在编写一个进入目录树的脚本(使用 os.walk()),然后访问与某个文件扩展名匹配的每个文件。但是,由于我的工具将用于的一些目录树还包含子目录,而这些子目录又包含很多无用的(为了这个脚本的目的)的东西,我想我会添加一个选项供用户指定要从遍历中排除的目录列表。

This is easy enough with os.walk(). After all, it's up to me to decide whether I actually want to visit the respective files / dirs yielded by os.walk() or just skip them. The problem is that if I have, for example, a directory tree like this:

使用 os.walk() 这很容易。毕竟,由我决定是否真的要访问 os.walk() 生成的相应文件/目录,或者只是跳过它们。问题是,例如,如果我有这样的目录树:

root--
     |
     --- dirA
     |
     --- dirB
     |
     --- uselessStuff --
                       |
                       --- moreJunk
                       |
                       --- yetMoreJunk

and I want to exclude uselessStuffand all its children, os.walk() will still descend into all the (potentially thousands of) sub directories of uselessStuff, which, needless to say, slows things down a lot. In an ideal world, I could tell os.walk() to not even bother yielding any more children of uselessStuff, but to my knowledge there is no way of doing that (is there?).

我想排除uselessStuff及其所有子目录, os.walk() 仍将下降到uselessStuff 的所有(可能有数千个)子目录中,不用说,这会大大减慢速度。在理想的世界中,我可以告诉 os.walk() 甚至不要再产生uselessStuff 的孩子,但据我所知,没有办法做到这一点(有吗?)。

Does anyone have an idea? Maybe there's a third-party library that provides something like that?

有没有人有想法?也许有一个第三方库可以提供类似的东西?

采纳答案by unutbu

Modifying dirsin-placewill prune the (subsequent) files and directories visited by os.walk:

dirs就地修改将修剪访问的(后续)文件和目录os.walk

# exclude = set([...])
for root, dirs, files in os.walk(top, topdown=True):
    dirs[:] = [d for d in dirs if d not in exclude]


From help(os.walk):

来自帮助(os.walk):

When topdown is true, the caller can modify the dirnames list in-place (e.g., via del or slice assignment), and walk will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search...

当 topdown 为 true 时,调用者可以就地修改 dirnames 列表(例如,通过 del 或 slice 赋值),而 walk 只会递归到名称保留在 dirnames 中的子目录;这可用于修剪搜索...

回答by Dmitri

... an alternative form of @unutbu's excellent answer that reads a little more directly, given that the intent is to excludedirectories, at the cost of O(n**2) vs O(n) time.

... @unutbu 优秀答案的另一种形式,读起来更直接,因为目的是排除目录,代价是 O(n**2) 与 O(n) 时间。

(Making a copy of the dirs list with list(dirs)is required for correct execution)

list(dirs)正确执行需要复制目录列表)

# exclude = set([...])
for root, dirs, files in os.walk(top, topdown=True):
    [dirs.remove(d) for d in list(dirs) if d in exclude]