Linux 如何对所有少于 4 个字符的单词进行 grep?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4982052/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 02:49:52  来源:igfitidea点击:

How do I grep for all words that are less than 4 characters?

linuxunixgrep

提问by TIMEX

I have a dictionary with words separated by line breaks.

我有一本字典,里面的单词用换行符分隔。

采纳答案by Michael Goldshteyn

You can just do:

你可以这样做:

egrep -x '.{1,3}' myfile

This will also skip blank lines, which are technically not words. Unfortunately, the above reg-ex will count apostrophes in contractions as letters as well as hyphens in hyphenated compound words. Hyphenated compound words are not a problem at such a low letter count, but I am not sure whether or not you want to count apostrophes in contractions, which are possible (e.g., I'm). You can try to use a reg-ex such as:

这也将跳过空行,这在技术上不是单词。不幸的是,上面的 reg-ex 会将收缩中的撇号计算为字母以及带连字符的复合词中的连字符。在如此低的字母数下,带连字符的复合词不是问题,但我不确定您是否要计算收缩中的撇号,这是可能的(例如,I'm)。您可以尝试使用正则表达式,例如:

egrep -x '\w{1,3}' myfile

..., but this will only match upper/lower case letters and not match contractions or hyphenated compound words at all.

...,但这只会匹配大写/小写字母,而根本不匹配收缩或带连字符的复合词。

回答by Paul Tomblin

Like this: grep -v "^...." my_file

像这样: grep -v "^...." my_file

回答by Mark Byers

Try this regular expression:

试试这个正则表达式:

grep -E '^.{1,3}$' your_dictionary