bash “while read LINE do”和grep问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5626374/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 20:24:02  来源:igfitidea点击:

"while read LINE do" and grep problems

bashgrepwhile-loopcat

提问by Kevin

I have two files.

我有两个文件。

file1.txt:  
Afghans  
Africans  
Alaskans  
...  

where file2.txtcontains the output from a wget on a webpage, so it's a big sloppy mess, but does contain many of the words from the first list.

wherefile2.txt包含网页上 wget 的输出,所以这是一个很大的草率混乱,但确实包含第一个列表中的许多单词。

Bashscript:

脚本:

cat file1.txt | while read LINE; do grep $LINE file2.txt; done

This did not work as expected. I wondered why, so I echoed out the $LINE variable inside the loop and added a sleep 1, so i could see what was happening:

这没有按预期工作。我想知道为什么,所以我在循环内回显了 $LINE 变量并添加了 sleep 1,这样我就可以看到发生了什么:

cat file1.txt | while read LINE; do echo $LINE; sleep 1; grep $LINE file2.txt; done

The output looks in terminal looks something like this:

终端中的输出看起来像这样:

Afghans
Africans
Alaskans
Albanians
Americans
grep: Chinese: No such file or directory
: No such file or directory
Arabians
Arabs
Arabs/East Indians
: No such file or directory
Argentinans
Armenians
Asian
Asian Indians
: No such file or directory
file2.txt: Asian Naruto
...

阿富汗人
非洲人
阿拉斯加
阿尔巴尼亚
美国人
的grep: CN :没有这样的文件或目录
:没有这样的文件或目录
阿拉伯人
阿拉伯人
阿拉伯人/东印度人
:没有这样的文件或目录
Argentinans
亚美尼亚
亚洲
印度人
:没有这样的文件或目录
FILE2.TXT:亚洲鸣
。 ..

So you can see it did finally find the word "Asian". But why does it say:

所以你可以看到它终于找到了“亚洲”这个词。但是为什么会说:

No such file or directory

无此文件或目录

?

?

Is there something weird going on or am I missing something here?

是不是有什么奇怪的事情发生了,还是我在这里遗漏了什么?

采纳答案by kurumi

@OP, First, use dos2unixas advised. Then use awk

@OP,首先,dos2unix按照建议使用。然后使用awk

awk 'FNR==NR{a[];next}{ for(i=1;i<=NF;i++){ if($i in a) {print $i} } } '  file1 file2_wget

Note: using while loop and grep inside the loop is not efficient, since for every iteration, you need to invoke grepon the file2.

注意:在循环内使用 while 循环和 grep 效率不高,因为对于每次迭代,您都需要grep在 file2上调用。

@OP, crude explanation: For meaning of FNR and NR, please refer to gawk manual. FNR==NR{a[1];next}means getting the contents of file1 into array a. when FNR is not equal to NR (which means reading the 2nd file now), it will check if each word in the file is in array a. If it is, print out. (the for loop is used to iterate each word)

@OP,粗略解释:关于 FNR 和 NR 的含义,请参阅gawk 手册FNR==NR{a[1];next}意味着将 file1 的内容放入 array a。当 FNR 不等于 NR 时(这意味着现在读取第二个文件),它将检查文件中的每个单词是否在 array 中a。如果是,打印出来。(for循环用于迭代每个单词)

回答by glenn Hymanman

What about

关于什么

grep -f file1.txt file2.txt

回答by SiegeX

Use more quotes and use less cat

多用引号,少用 cat

while IFS= read -r LINE; do 
  grep "$LINE" file2.txt
done < file1.txt

回答by Ignacio Vazquez-Abrams

As well as the quoting issue, the file you've downloaded contains CRLF line endings which are throwing readoff. Use dos2unixto convert file1.txt before iterating over it.

除了引用问题外,您下载的文件还包含正在丢弃的 CRLF 行尾read。用于dos2unix在迭代之前转换 file1.txt。

回答by Sabin

Although usng awk is faster, grep produces a lot more details with less effort. So, after issuing dos2unixuse:

尽管 usng awk 更快,但 grep 可以轻松生成更多细节。因此,在发出dos2unix 后使用:

grep -F -i -n -f <file_containing_pattern> <file_containing_data_blob>

grep -F -i -n -f <file_containing_pattern> <file_containing_data_blob>

You will have all the matches + line numbers (case insensitive)

您将拥有所有匹配项 + 行号(不区分大小写)

At minimum this will suffice to find all the words from file_containing_pattern:

至少这足以从 file_ contains_pattern 中找到所有单词:

grep -F -f <file_containing_pattern> <file_containing_data_blob>