如何打印仅包含 BASH 列表中字符的行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23740463/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 10:29:15  来源:igfitidea点击:

How to print lines that only contain characters from a list in BASH?

regexbashgrep

提问by Village

I have a file called "dictionary.txt" containing a list of all possible words, e.g.:

我有一个名为“dictionary.txt”的文件,其中包含所有可能单词的列表,例如:

a
aardvark
act
anvil
ate
...

How can I search this, only printing lines containing letters from a limited list, e.g., if the list contains the letters "c", "a", and "t", a search will reveal these words:

我如何搜索这个,只打印包含有限列表中字母的行,例如,如果列表包含字母“c”、“a”和“t”,搜索将显示这些词:

a
act
cat

If the letters "e", "a", and "t" are searched, only these words are found from "dictionary.txt":

如果搜索字母“e”、“a”和“t”,则只能从“dictionary.txt”中找到这些词:

a
ate
eat
tea

The only solution I have managed is this:

我管理的唯一解决方案是:

  • Create a list of all possible letters.
  • Delete the searched letters from this list, leaving a list of letters that I do not want to search for.
  • With a for loop cycling each of those letters, delete all lines from the dictionary that contains those letters.
  • Print the remaining words found in the dictionary.
  • 创建所有可能字母的列表。
  • 从此列表中删除搜索到的字母,留下我不想搜索的字母列表。
  • 使用循环每个字母的 for 循环,从字典中删除包含这些字母的所有行。
  • 打印在字典中找到的剩余单词。

This solution is very slow. Also, I need to use this code with other languages, which have thousands of possible characters, so this search method is especially slow.

此解决方案非常缓慢。另外,我需要将此代码与其他语言一起使用,这些语言有数千个可能的字符,因此这种搜索方法特别慢。

How can I print only those lines from "dictionary.txt" that only contain the searched-for-letters, and nothing else?

如何仅打印“dictionary.txt”中仅包含搜索到的字母的那些行,而没有其他任何内容?

回答by amphetamachine

grep '^[eat]*$' dictionary.txt

Explanation:

解释:

^= marker meaning beginning of line

^= 标记表示行的开头

$= marker meaning end of line

$= 标记表示行尾

[abc]= character class ("match any one of these characters")

[abc]= 字符类(“匹配这些字符中的任何一个”)

*= multiplier for character class (zero or more repetitions)

*= 字符类的乘数(零次或多次重复)

回答by galaxy

Unfortunately, I cannot comment, otherwise I'd add to amphetamachine's answer. Anyway, with the updated condition of thousands of search characters you may want to do the following:

不幸的是,我无法发表评论,否则我会添加到amphetamachine 的答案中。无论如何,使用数千个搜索字符的更新条件,您可能需要执行以下操作:

grep -f patterns.txt dictionary.txt

where patterns.txtis your regexp:

patterns.txt你的正则表达式在哪里:

/^[eat]\+$/

Below is a sample session:

下面是一个示例会话:

$ cat << EOF > dictionary.txt
> one
> two
> cat
> eat
> four
> tea
> five
> cheat
> EOF
$ cat << EOF > patterns.txt
> ^[eat]\+$
> EOF
$ grep -f patterns.txt dictionary.txt
eat
tea
$

This way you are not limited by the shell (Argument list too long). Also, you can specify multiple patterns in the file:

这样你就不受 shell 的限制(参数列表太长)。此外,您可以在文件中指定多个模式:

$ cat patterns.txt
^[eat]\+$
^five$
$ grep -f patterns.txt dictionary.txt
eat
tea
five
$

回答by savanto

Try it using awk:

尝试使用awk

awk '/^[eat]*$/ { print }' dictionary.txt

I found this to be at least an order of magnitude faster than grep for more than about 7 letters. However, I don't know if you will run into the same problem with thousands of letters, as I didn't test that many.

我发现这至少比 grep 快一个数量级,超过大约 7 个字母。但是,我不知道您是否会遇到与数千个字母相同的问题,因为我没有测试那么多。

You can even search for multiple patterns at once (this is faster than searching each pattern one at a time, since the dictionary file will be read only once). Every pattern acts as an if statement:

您甚至可以一次搜索多个模式(这比一次搜索一个模式要快,因为字典文件只会被读取一次)。每个模式都充当一个 if 语句:

awk '/^[eat]*$/ { print "[eat]: " 
sed -n '/a/'p words.txt
} /^[cat]*$/ { print "[cat]: "
LC_ALL="C" grep '^[a-zA-Z????üü?]*$' dictionary.txt
}' dictionary.txt

回答by petrus4

##代码##

Use this for whichever letter you need to find. If you want to find more than one letter together, simply repeat the command.

将此用于您需要查找的任何字母。如果您想一起查找多个字母,只需重复该命令即可。

Grep also should not be used for more than the most simple/elementary of searches, IMHO. Although I normally hesitate to call any of the POSIX utilities obsolete, I do try and avoid grep. Its' syntax is extremely inconsistent.

恕我直言,除了最简单/最基本的搜索之外,Grep 也不应该被使用。尽管我通常会犹豫是否将任何 POSIX 实用程序称为过时,但我确实尝试避免使用 grep。它的语法极其不一致。

Studying this text file is also recommended. http://sed.sourceforge.net/sed1line.txt

还建议研究此文本文件。http://sed.sourceforge.net/sed1line.txt

回答by tak

If you want to include e.g. Umlauts in the pattern and not want to have the other accents, set the LC_ALL="C"prior to executing the grep.

如果您想在模式中包含例如元音变音而不想要其他重音,请LC_ALL="C"在执行 grep 之前设置。

This e.g. will give you only the candidate German words in a potential dictionary.txt file.

这个例子只会给你一个潜在的 dictionary.txt 文件中的候选德语单词。

##代码##