Linux 命令：如何仅“查找”文本文件？

Question

提问by datasn.io

After a few searches from Google, what I come up with is:

在谷歌搜索了几次之后，我想出的是：

find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text

which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.

这非常不方便，并且会输出不需要的文本，例如 mime 类型信息。有什么更好的解决方案吗？我在同一个文件夹中有很多图像和其他二进制文件，其中有很多我需要搜索的文本文件。

Answer 1

采纳答案by crudcore

I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use findto find only non-binary files:

我知道这是一个旧线程，但我偶然发现了它，并认为我会分享我的方法，我发现它是一种非常快速的方法，用于find仅查找非二进制文件：

find . -type f -exec grep -Iq . {} \; -print

The -Ioption to grep tells it to immediately ignore binary files and the .option along with the -qwill make it immediately match text files so it goes very fast. You can change the -printto a -print0for piping into an xargs -0or something if you are concerned about spaces (thanks for the tip, @lucas.werkmeister!)

-Igrep的选项告诉它立即忽略二进制文件，该.选项连同-q将使其立即匹配文本文件，因此它运行得非常快。如果您担心空格，您可以将管道更改-print为 a或其他东西（感谢您的提示，@lucas.werkmeister！）-print0xargs -0

Also the first dot is only necessary for certain BSD versions of findsuch as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.

此外，第一个点仅对于某些 BSD 版本（find例如在 OS X 上）是必需的，但是如果您想将它放在别名或其他东西中，它不会伤害任何东西。

EDIT: As @ruslan correctly pointed out, the -andcan be omitted since it is implied.

编辑：正如@ruslan 正确指出的那样，-and可以省略，因为它是隐含的。

Answer 2

回答by Navi

How about this

这个怎么样

 find . -type f|xargs grep "needle text"

Answer 3

回答by peoro

Why is it unhandy? If you need to use it often, and don't want to type it every time just define a bash function for it:

为什么不方便？如果您需要经常使用它，并且不想每次都输入它，只需为它定义一个 bash 函数：

function findTextInAsciiFiles {
    # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
    find "" -type f -exec grep -l "" {} \; -exec file {} \; | grep text
}

put it in your .bashrcand then just run:

把它放在你的.bashrc然后运行：

findTextInAsciiFiles your_folder "needle text"

whenever you want.

无论你什么时候想要。

EDITto reflect OP's edit:

编辑以反映 OP 的编辑：

if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before :: cut -d':' -f1:

如果你想删除 mime 信息，你可以在管道中添加一个进一步的阶段来过滤 mime 信息。这应该做的伎俩，通过采取只什么来之前:：cut -d':' -f1：

function findTextInAsciiFiles {
    # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
    find "" -type f -exec grep -l "" {} \; -exec file {} \; | grep text | cut -d ':' -f1
}

Answer 4

回答by thkala

How about this:

这个怎么样：

$ grep -rl "needle text" my_folder | tr '\n' '$ grep -rl "needle text" my_folder | tr '\n' '$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'

If you want the filenames without the file types, just add a final sedfilter.

如果您想要没有文件类型的文件名，只需添加一个最终sed过滤器。

find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"

You can filter-out unneeded file types by adding more -e 'type'options to the last grepcommand.

您可以通过向-e 'type'最后一个grep命令添加更多选项来过滤掉不需要的文件类型。

EDIT:

编辑：

If your xargsversion supports the -doption, the commands above become simpler:

如果您的xargs版本支持该-d选项，上面的命令会变得更简单：

#!/bin/bash
#if [ ! "" ] ; then
    echo "Usage: #!/bin/bash
[[ "$(file -bi )" == *"file"* ]]
 <search>";
    exit
fi

find . -type f -print0 \
  | xargs -0 file \
  | grep -P text \
  | cut -d: -f1 \
  | xargs -i% grep -Pil "" "%"

Answer 5

回答by Antti Ryts?l?

find . -type f -exec istext {} \; -exec grep -nHi mystring {} \;

This is unfortunately not space save. Putting this into bash script makes it a bit easier.

不幸的是，这不是节省空间。把它放到 bash 脚本中会让它更容易一些。

This is space safe:

这是空间安全的：

find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &

Answer 6

回答by Robert

Here's how I've done it ...

这是我如何做到的......

1 . make a small script to test if a file is plain text istext:

1 . 编写一个小脚本来测试文件是否为纯文本 istext：

findex() {
    cat ~/.src_list | xargs grep "$*" 2>/dev/null
}

2 . use find as before

2 . 像以前一样使用 find

findex "needle text"

Answer 7

回答by crayzeewulf

Based on this SO question:

基于这个问题：

grep -rIl "needle text" my_folder

Answer 8

回答by Frank Fang

I do it this way: 1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:

我是这样做的：1) 由于有太多文件 (~30k) 需要搜索，我每天生成文本文件列表以使用以下命令通过 crontab 使用：

find . -type f | xargs grep -I "needle text"

2) create a function in .bashrc:

2）在.bashrc中创建一个函数：

find . -type f -print0 | xargs -0 grep -I "needle text"

Then I can use below command to do the search:

然后我可以使用以下命令进行搜索：

find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"

HTH:)

哈：）

Answer 9

回答by dalore

I prefer xargs

我更喜欢 xargs

find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"

if your filenames are weird look up using the -0 options:

如果您的文件名很奇怪，请使用 -0 选项查找：

find . -type f -print0 | xargs -0 grep -I "needle text"

Answer 10

回答by fuujuhi

I have two issues with histumness' answer:

我对 histumness 的回答有两个问题：

It only list text files. It does not actually search them as requested. To actually search, use
```
find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
```
It spawns a grep process for every file, which is very slow. A better solution is then
```
find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
```
or simply
```
find . -type f -print0 | xargs -0 grep -I "needle text"
```
This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.

它只列出文本文件。它实际上并没有按照要求搜索它们。要实际搜索，请使用
```
ag -t "needle text"    # Much faster than ack
ack -t "needle text"   # or ack-grep
```
它为每个文件生成一个 grep 进程，这非常慢。一个更好的解决方案是
##代码##
或者干脆
##代码##
与上述解决方案（2.5GB 数据/7700 个文件）的 4 秒相比，这仅需要 0.2 秒，即快 20 倍。

Also, nobody cited ag, the Silver Searcheror ack-grep?as alternatives. If one of these are available, they are much better alternatives:

此外，没有人引用ag、Silver Searcher或ack-grep作为替代方案。如果其中之一可用，它们是更好的选择：

##代码##

As a last note, beware of false positives(binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.

最后一点，请注意误报（将二进制文件作为文本文件）。我已经使用 grep/ag/ack 得到了误报，所以最好在编辑文件之前先列出匹配的文件。

Linux 命令：如何仅“查找”文本文件？

提问by datasn.io

采纳答案by crudcore

回答by Navi

回答by peoro

回答by thkala

回答by Antti Ryts?l?

回答by Robert

回答by crayzeewulf

回答by Frank Fang

回答by dalore

回答by fuujuhi

相关推荐

最近更新

标签

Linux 命令：如何仅“查找”文本文件？

提问by datasn.io

采纳答案by crudcore

回答by Navi

回答by peoro

回答by thkala

回答by Antti Ryts?l?

回答by Robert

回答by crayzeewulf

回答by Frank Fang

回答by dalore

回答by fuujuhi

相关推荐

从 C# 读取非 .NET DLL 版本？

Linux中如何确定进程内存限制？

Linux 发送电子邮件的shell脚本

C# 在代码中设置 WPF 图片源

相关推荐

最近更新

标签