Linux How to use grep efficiently?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5200591/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:07:39  来源:igfitidea点击:

How to use grep efficiently?

linuxunixsearchtextgrep

提问by Legend

I have a large number of small files to be searched. I have been looking for a good de-facto multi-threaded version of grepbut could not find anything. How can I improve my usage of grep? As of now I am doing this:

I have a large number of small files to be searched. I have been looking for a good de-facto multi-threaded version of grepbut could not find anything. How can I improve my usage of grep? As of now I am doing this:

grep -R "string" >> Strings

采纳答案by Legend

If you have xargs installed on a multi-core processor, you can benefit from the following just in case someone is interested.

If you have xargs installed on a multi-core processor, you can benefit from the following just in case someone is interested.

Environment:

Environment:

Processor: Dual Quad-core 2.4GHz
Memory: 32 GB
Number of files: 584450
Total Size: ~ 35 GB

Tests:

Tests:

1. Find the necessary files, pipe them to xargs and tell it to execute 8 instances.

1. Find the necessary files, pipe them to xargs and tell it to execute 8 instances.

time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -H "string" >> Strings_find8

real    3m24.358s
user    1m27.654s
sys     9m40.316s

2. Find the necessary files, pipe them to xargs and tell it to execute 4 instances.

2. Find the necessary files, pipe them to xargs and tell it to execute 4 instances.

time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P4 grep -H "string" >> Strings

real    16m3.051s
user    0m56.012s
sys     8m42.540s

3. Suggested by @Stephen: Find the necessary files and use + instead of xargs

3. Suggested by @Stephen: Find the necessary files and use + instead of xargs

time find ./ -name "*.ext" -exec grep -H "string" {} \+ >> Strings

real    53m45.438s
user    0m5.829s
sys     0m40.778s

4. Regular recursive grep.

4. Regular recursive grep.

grep -R "string" >> Strings

real    235m12.823s
user    38m57.763s
sys     38m8.301s

For my purposes, the first command worked just fine.

For my purposes, the first command worked just fine.

回答by Karthik Gurusamy

Wondering why -n1is used below won't it be faster to use a higher value (say -n8? or leave it out so xargs will do the right thing)?

Wondering why -n1is used below won't it be faster to use a higher value (say -n8? or leave it out so xargs will do the right thing)?

xargs -0 -n1 -P8 grep -H "string"

Seems it will be more efficient to give each grep that's forked to process on more than one file (I assume -n1 will give only one file name in argv for the grep) -- as I see it, we should be able to give the highest n possible on the system (based on argc/argvmax length limitation). So the setup cost of bringing up a new grep process is not incurred more often.

Seems it will be more efficient to give each grep that's forked to process on more than one file (I assume -n1 will give only one file name in argv for the grep) -- as I see it, we should be able to give the highest n possible on the system (based on argc/argvmax length limitation). So the setup cost of bringing up a new grep process is not incurred more often.