bash 如何从 Linux 'find' 命令的输出中排除匹配特定模式的目录?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11456200/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 02:44:38  来源:igfitidea点击:

How can I exclude directories matching certain patterns from the output of the Linux 'find' command?

regexlinuxbashgrep

提问by phonetagger

I want to use regex's with Linux's findcommand to dive recursively into a gargantuan directory tree, showing me all of the .c, .cpp, and .h files, but omitting matches containing certain substrings. Ultimately I want to send the output to an xargscommand to do certain processing on all of the matching files. I can pipe the findoutput through grep to remove matches containing those substrings, but that solution doesn't work so well with filenames that contain spaces. So I tried using find's -print0 option, which terminates each filename with a nul char instead of a newline (whitespace), and using xargs -0to expect nul-delimited input instead of space-delimited input, but I couldn't figure out how to pass the nul-delimited findthrough the piped grep filters successfully; grep -Z didn't seem to help in that respect.

我想在 Linux 的find命令中使用正则表达式以递归方式进入一个庞大的目录树,向我展示所有 .c、.cpp 和 .h 文件,但省略包含某些子字符串的匹配项。最终,我想将输出发送到一个xargs命令,以对所有匹配的文件进行某些处理。我可以find通过 grep管道输出以删除包含这些子字符串的匹配项,但该解决方案不适用于包含空格的文件名。所以我尝试使用find's -print0 选项,它用 nul 字符而不是换行符(空格)终止每个文件名,并使用xargs -0期望 nul 分隔的输入而不是空格分隔的输入,但我不知道如何通过空分隔find成功通过管道 grep 过滤器;grep -Z 在这方面似乎没有帮助。

So I figured I'd just write a better regex for findand do away with the intermediary grepfilters... perhaps sedwould be an alternative?

所以我想我只是写一个更好的正则表达式find并取消中间grep过滤器......也许sed是另一种选择?

In any case, for the following small sampling of directories...

无论如何,对于以下目录的小样本......

./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h

...I want the output to include all of the .h, .c, and .cpp files but NOT those ones that appear in the 'generated' and 'deploy' directories.

...我希望输出包括所有 .h、.c 和 .cpp 文件,但不包括出现在“生成”和“部署”目录中的那些文件。

BTW, you can create an entire test directory (named fredbarney) for testing solutions to this question by cutting & pasting this whole line into your bash shell:

顺便说一句,您可以通过将整行剪切并粘贴到您的 bash shell 中来创建一个完整的测试目录(名为 fredbarney)来测试这个问题的解决方案:

mkdir fredbarney; cd fredbarney; mkdir fred; cd fred; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > inc/dino.h; echo x > docs/info.docx; echo x > generated/dino.h; echo x > deploy/dino.h; echo x > src/dino.cpp; cd ..; mkdir barney; cd barney; mkdir inc; mkdir docs; mkdir generated; mkdir deploy; mkdir src; echo x > 'inc/bam bam.h'; echo x > 'docs/info info.docx'; echo x > 'generated/bam bam.h'; echo x > 'deploy/bam bam.h'; echo x > 'src/bam bam.cpp'; cd ..;

This command finds all of the .h, .c, and .cpp files...

此命令查找所有 .h、.c 和 .cpp 文件...

find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$"

...but if I pipe its output through xargs, the 'bam bam' files each get treated as two separate (nonexistant) filenames (note that here I'm simply using lsas a stand-in for what I actually want to do with the output):

...但是如果我通过 xargs 将其输出通过管道传输,则每个 'bam bam' 文件都会被视为两个单独的(不存在的)文件名(请注意,这里我只是将其ls用作我实际想要做的事情的替代品)输出):

$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" | xargs -n 1 ls
ls: ./barney/generated/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/src/bam: No such file or directory
ls: bam.cpp: No such file or directory
ls: ./barney/deploy/bam: No such file or directory
ls: bam.h: No such file or directory
ls: ./barney/inc/bam: No such file or directory
ls: bam.h: No such file or directory
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h

So I can enhance that with the -print0 and -0 args to findand xargs:

所以我可以用 -print0 和 -0 args to findand来增强它xargs

$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | xargs -0 -n 1 ls
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h

...which is great, except that I don't want the 'generated' and 'deploy' directories in the output. So I try this:

...这很棒,除了我不希望输出中包含“生成”和“部署”目录。所以我试试这个:

$ find . -regextype posix-egrep -regex ".+\.(c|cpp|h)$" -print0 | grep -v generated | grep -v deploy | xargs -0 -n 1 ls
barney  fred

...which clearly does not work. So I tried using the -Z option with grep (not knowing exactly what the -Z option really does) and that didn't work either. So I figured I'd write a better regex for findand this is the best I could come up with:

...这显然不起作用。因此,我尝试将 -Z 选项与 grep 一起使用(不知道 -Z 选项究竟是做什么的),但也没有用。所以我想我会写一个更好的正则表达式find,这是我能想到的最好的:

find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls

...but bash didn't like that (!.*: event not found, whatever that means), and even if that weren't an issue, my regex doesn't seem to work on the regex tester web page I normally use.

...但 bash 不喜欢那样(!.*: event not found,不管这意味着什么),即使这不是问题,我的正则表达式似乎在我通常使用的正则表达式测试器网页上不起作用用。

Any ideas how I can make this work? This is the output I want:

我有什么想法可以使这项工作?这是我想要的输出:

$ find . [----options here----] | [----maybe grep or sed----] | xargs -0 -n 1 ls
./barney/src/bam bam.cpp
./barney/inc/bam bam.h
./fred/src/dino.cpp
./fred/inc/dino.h

...and I'd like to avoid scripts & temporary files, which I suppose might be my only option.

...我想避免使用脚本和临时文件,我想这可能是我唯一的选择。

Thanks in advance! -Mark

提前致谢!-标记

回答by sorpigal

This works for me:

这对我有用:

find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -not -path '*/generated/*' \
       -not -path '*/deploy/*' -print0 | xargs -0 ls -L1d

Changes from your version are minimal: I added exclusions of certain path patterns separately, because that's easier, and I single-quote things to hide them from shell interpolation.

对您的版本的更改很小:我分别添加了某些路径模式的排除项,因为这更容易,并且我单引号将它们从 shell 插值中隐藏起来。

The event not found is because !is being interpreted as a request for history expansion by bash. The fix is to use single quotes instead of double quotes.

未找到事件是因为!被 解释为历史扩展请求bash。解决方法是使用单引号而不是双引号。

Pop quiz: What characters are special inside of a single-quoted string in sh?

小测验: 中的单引号字符串中有哪些特殊字符sh

Answer: Only'is special (it ends the string). That's the ultimate safety.

答案:Only'是特殊的(它结束字符串)。这才是终极的安全。

grepwith -Z(sometimes known as --null) makes grepoutputterminated with a null character instead of newline. What you wanted was -z(sometimes known as --null-data) which causes grepto interpret a null character in its inputas end-of-line instead of a newline character. This makes it work as expected with the output of find ... -print0, which adds a null character after each file name instead of a newline.

grepwith -Z(有时称为--null)使grep输出以空字符而不是换行符终止。您想要的是-z(有时称为--null-data),这会导致grep将其输入中的空字符解释为行尾而不是换行符。这使它在 的输出中按预期工作,find ... -print0在每个文件名后添加一个空字符而不是换行符。

If you had done it this way:

如果你这样做了:

find . -regextype posix-egrep -regex '.+\.(c|cpp|h)$' -print0 | \
    grep -vzZ generated | grep -vzZ deploy | xargs -0 ls -1Ld

Then the input andoutput of grepwould have been null-delimited and it would have worked correctly... until one of your source files began being named deployment.cppand started getting "mysteriously" excluded by your script.

然后 的输入输出grep将以空分隔并且它会正常工作......直到您的源文件之一开始被命名deployment.cpp并开始被您的脚本“神秘地”排除。

Incidentally, here's a nicer way to generate your testcase file set.

顺便说一句,这是生成测试用例文件集的更好方法。

while read -r file ; do
    mkdir -p "${file%/*}"
    touch "$file"
done <<'DATA'
./barney/generated/bam bam.h
./barney/src/bam bam.cpp
./barney/deploy/bam bam.h
./barney/inc/bam bam.h
./fred/generated/dino.h
./fred/src/dino.cpp
./fred/deploy/dino.h
./fred/inc/dino.h
DATA

Since I did this anyway to verify I figured I'd share it and save you from repetition. Don't do anything twice! That's what computers are for.

因为无论如何我都这样做是为了验证我想我会分享它并使您免于重复。不要做任何事情两次!这就是计算机的用途。

回答by alpha_989

Your command:

你的命令:

find . -regextype posix-egrep -regex "(?!.*(generated|deploy).*$)(.+\.(c|cpp|h)$)" -print0 | xargs -0 -n 1 ls

fails because you are trying to use Posix extended regular expressions, which dont support lookaround/lookbehind etc. https://superuser.com/a/596499/658319

失败,因为您正在尝试使用Posix extended regular expressions,它不支持环视/后视等。https://superuser.com/a/596499/658319

finddoes support pcre, so if you convert to pcre, this should work.

find确实支持pcre,所以如果您转换为pcre,这应该可以工作。