bash 如何找到括号之间出现的所有单词?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10661646/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 02:19:11  来源:igfitidea点击:

How to find all words appearing between parenthesis?

bashgrep

提问by Village

I have a file containing some words in parenthesis. I'd like to compile a list of all of the unique words appearing there, e.g.:

我有一个文件,括号中包含一些单词。我想汇编出现在那里的所有独特单词的列表,例如:

This is some (text).
This (text) has some (words) in parenthesis.
Sometimes, there are numbers, such as (123) in parenthesis too.

This would be the resulting list:

这将是结果列表:

text
words
123

How can I list all of the items appearing between parenthesis?

如何列出括号之间出现的所有项目?

回答by Steve

You can use awklike this:

你可以这样使用awk

awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' file.txt

awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' file.txt

prints:

印刷:

text
text
words
123

You can use an array to print the unique values:

您可以使用数组来打印唯一值:

awk -F "[()]" '{ for (i=2; i<NF; i+=2) array[$1]=$i; print array[$1] }' file.txt

awk -F "[()]" '{ for (i=2; i<NF; i+=2) array[$1]=$i; print array[$1] }' file.txt

prints:

印刷:

text
words
123

HTH

HTH

回答by glenn Hymanman

With GNU grep, you can use a perl-compatible regex with look-around assertions to exclude the parens:

使用 GNU grep,您可以使用与 perl 兼容的正则表达式和环视断言来排除括号:

grep -Po '(?<=\().*?(?=\))' file.txt | sort -u

回答by mkb

grep -oE '\([[:alnum:]]*?\)' | sed 's/[()]//g' | sort | uniq

grep -oE '\([[:alnum:]]*?\)' | sed 's/[()]//g' | sort | uniq

  • -oOnly prints the matching text
  • -Emeans use extended regular expressions
  • \(means match a literal paren
  • [[:alnum:]]is the POSIX character class for letters and numbers.
  • -o只打印匹配的文本
  • -E意味着使用扩展的正则表达式
  • \(表示匹配文字括号
  • [[:alnum:]]是字母和数字的 POSIX 字符类。

That sedscript should strip out the parens. This is tested against GNU grep, but BSD sed so be wary.

sed脚本应该去掉括号。这是针对 GNU grep 测试的,但是 BSD sed 所以要小心。

回答by Mark O'Connor

To reproduce your list:

要重现您的列表:

cat file.txt | sed  's/.*(\(.*\)).*//'

To compile a list of unique words, you need to process the list further:

要编译唯一单词列表,您需要进一步处理列表:

cat file.txt | sed  's/.*(\(.*\)).*//' | sort | uniq

回答by Venkat Madhav

You can try this

你可以试试这个

 sed -e 's/\(/\n\(/g' -e 's/\)/\n/g' filename|awk -F'(' '{print }'|sort -u

Explaination:

说明:

The 1st sed statement places the words in parenthesis in new line and the second sed replaces the character ')' with new line. So after running the below statement

第一个 sed 语句将括号中的单词放在新行中,第二个 sed 将字符 ')' 替换为新行。所以在运行下面的语句后

sed -e 's/\(/\n\(/g' -e 's/\)/\n/g' filename

the output would look like this

输出看起来像这样

This is some 
(text
.This 
(text
has some 
(words
 in parenthesis.
Sometimes, there are numbers, such as 
(123
 in parenthesis too.

Now piping this output to below awk statement which prints the second word between the filter character '('

现在将此输出传送到 awk 语句下方,该语句打印过滤器字符 '(' 之间的第二个单词

awk -F'(' '{print }'

the output now will be

现在的输出将是

text
text
words
123

the above output is piped to sort -u command to give unique words from the above output. Hope this explanation helps.

上面的输出通过管道传送到 sort -u 命令,以从上面的输出中给出唯一的词。希望这个解释有帮助。