Linux 命令:如何仅“查找”文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4767396/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Linux command: How to 'find' only text files?
提问by datasn.io
After a few searches from Google, what I come up with is:
在谷歌搜索了几次之后,我想出的是:
find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text
which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.
这非常不方便,并且会输出不需要的文本,例如 mime 类型信息。有什么更好的解决方案吗?我在同一个文件夹中有很多图像和其他二进制文件,其中有很多我需要搜索的文本文件。
采纳答案by crudcore
I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use find
to find only non-binary files:
我知道这是一个旧线程,但我偶然发现了它,并认为我会分享我的方法,我发现它是一种非常快速的方法,用于find
仅查找非二进制文件:
find . -type f -exec grep -Iq . {} \; -print
The -I
option to grep tells it to immediately ignore binary files and the .
option along with the -q
will make it immediately match text files so it goes very fast. You can change the -print
to a -print0
for piping into an xargs -0
or something if you are concerned about spaces (thanks for the tip, @lucas.werkmeister!)
-I
grep的选项告诉它立即忽略二进制文件,该.
选项连同-q
将使其立即匹配文本文件,因此它运行得非常快。如果您担心空格,您可以将管道更改-print
为 a或其他东西(感谢您的提示,@lucas.werkmeister!)-print0
xargs -0
Also the first dot is only necessary for certain BSD versions of find
such as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.
此外,第一个点仅对于某些 BSD 版本(find
例如在 OS X 上)是必需的,但是如果您想将它放在别名或其他东西中,它不会伤害任何东西。
EDIT: As @ruslan correctly pointed out, the -and
can be omitted since it is implied.
编辑:正如@ruslan 正确指出的那样,-and
可以省略,因为它是隐含的。
回答by Navi
How about this
这个怎么样
find . -type f|xargs grep "needle text"
回答by peoro
Why is it unhandy? If you need to use it often, and don't want to type it every time just define a bash function for it:
为什么不方便?如果您需要经常使用它,并且不想每次都输入它,只需为它定义一个 bash 函数:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "" -type f -exec grep -l "" {} \; -exec file {} \; | grep text
}
put it in your .bashrc
and then just run:
把它放在你的.bashrc
然后运行:
findTextInAsciiFiles your_folder "needle text"
whenever you want.
无论你什么时候想要。
EDITto reflect OP's edit:
编辑以反映 OP 的编辑:
if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before :
: cut -d':' -f1
:
如果你想删除 mime 信息,你可以在管道中添加一个进一步的阶段来过滤 mime 信息。这应该做的伎俩,通过采取只什么来之前:
:cut -d':' -f1
:
function findTextInAsciiFiles {
# usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
find "" -type f -exec grep -l "" {} \; -exec file {} \; | grep text | cut -d ':' -f1
}
回答by thkala
How about this:
这个怎么样:
$ grep -rl "needle text" my_folder | tr '\n' '$ grep -rl "needle text" my_folder | tr '\n' '$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'
If you want the filenames without the file types, just add a final sed
filter.
如果您想要没有文件类型的文件名,只需添加一个最终sed
过滤器。
find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"
You can filter-out unneeded file types by adding more -e 'type'
options to the last grep
command.
您可以通过向-e 'type'
最后一个grep
命令添加更多选项来过滤掉不需要的文件类型。
EDIT:
编辑:
If your xargs
version supports the -d
option, the commands above become simpler:
如果您的xargs
版本支持该-d
选项,上面的命令会变得更简单:
#!/bin/bash
#if [ ! "" ] ; then
echo "Usage: #!/bin/bash
[[ "$(file -bi )" == *"file"* ]]
<search>";
exit
fi
find . -type f -print0 \
| xargs -0 file \
| grep -P text \
| cut -d: -f1 \
| xargs -i% grep -Pil "" "%"
回答by Antti Ryts?l?
find . -type f -exec istext {} \; -exec grep -nHi mystring {} \;
This is unfortunately not space save. Putting this into bash script makes it a bit easier.
不幸的是,这不是节省空间。把它放到 bash 脚本中会让它更容易一些。
This is space safe:
这是空间安全的:
find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &
回答by Robert
Here's how I've done it ...
这是我如何做到的......
1 . make a small script to test if a file is plain text istext:
1 . 编写一个小脚本来测试文件是否为纯文本 istext:
findex() {
cat ~/.src_list | xargs grep "$*" 2>/dev/null
}
2 . use find as before
2 . 像以前一样使用 find
findex "needle text"
回答by crayzeewulf
回答by Frank Fang
I do it this way: 1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:
我是这样做的:1) 由于有太多文件 (~30k) 需要搜索,我每天生成文本文件列表以使用以下命令通过 crontab 使用:
find . -type f | xargs grep -I "needle text"
2) create a function in .bashrc:
2)在.bashrc中创建一个函数:
find . -type f -print0 | xargs -0 grep -I "needle text"
Then I can use below command to do the search:
然后我可以使用以下命令进行搜索:
find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
HTH:)
哈:)
回答by dalore
I prefer xargs
我更喜欢 xargs
find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
if your filenames are weird look up using the -0 options:
如果您的文件名很奇怪,请使用 -0 选项查找:
find . -type f -print0 | xargs -0 grep -I "needle text"
回答by fuujuhi
I have two issues with histumness' answer:
我对 histumness 的回答有两个问题:
It only list text files. It does not actually search them as requested. To actually search, use
find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
It spawns a grep process for every file, which is very slow. A better solution is then
find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
or simply
find . -type f -print0 | xargs -0 grep -I "needle text"
This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.
它只列出文本文件。它实际上并没有按照要求搜索它们。要实际搜索,请使用
ag -t "needle text" # Much faster than ack ack -t "needle text" # or ack-grep
它为每个文件生成一个 grep 进程,这非常慢。一个更好的解决方案是
##代码##或者干脆
##代码##与上述解决方案(2.5GB 数据/7700 个文件)的 4 秒相比,这仅需要 0.2 秒,即快 20 倍。
Also, nobody cited ag, the Silver Searcheror ack-grep?as alternatives. If one of these are available, they are much better alternatives:
此外,没有人引用ag、Silver Searcher或ack-grep作为替代方案。如果其中之一可用,它们是更好的选择:
##代码##As a last note, beware of false positives(binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.
最后一点,请注意误报(将二进制文件作为文本文件)。我已经使用 grep/ag/ack 得到了误报,所以最好在编辑文件之前先列出匹配的文件。