bash 如何找到括号之间出现的所有单词?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10661646/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find all words appearing between parenthesis?
提问by Village
I have a file containing some words in parenthesis. I'd like to compile a list of all of the unique words appearing there, e.g.:
我有一个文件,括号中包含一些单词。我想汇编出现在那里的所有独特单词的列表,例如:
This is some (text).
This (text) has some (words) in parenthesis.
Sometimes, there are numbers, such as (123) in parenthesis too.
This would be the resulting list:
这将是结果列表:
text
words
123
How can I list all of the items appearing between parenthesis?
如何列出括号之间出现的所有项目?
回答by Steve
You can use awklike this:
你可以这样使用awk:
awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' file.txt
awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' file.txt
prints:
印刷:
text
text
words
123
You can use an array to print the unique values:
您可以使用数组来打印唯一值:
awk -F "[()]" '{ for (i=2; i<NF; i+=2) array[$1]=$i; print array[$1] }' file.txt
awk -F "[()]" '{ for (i=2; i<NF; i+=2) array[$1]=$i; print array[$1] }' file.txt
prints:
印刷:
text
words
123
HTH
HTH
回答by glenn Hymanman
With GNU grep, you can use a perl-compatible regex with look-around assertions to exclude the parens:
使用 GNU grep,您可以使用与 perl 兼容的正则表达式和环视断言来排除括号:
grep -Po '(?<=\().*?(?=\))' file.txt | sort -u
回答by mkb
grep -oE '\([[:alnum:]]*?\)' | sed 's/[()]//g' | sort | uniq
grep -oE '\([[:alnum:]]*?\)' | sed 's/[()]//g' | sort | uniq
-oOnly prints the matching text-Emeans use extended regular expressions\(means match a literal paren[[:alnum:]]is the POSIX character class for letters and numbers.
-o只打印匹配的文本-E意味着使用扩展的正则表达式\(表示匹配文字括号[[:alnum:]]是字母和数字的 POSIX 字符类。
That sedscript should strip out the parens. This is tested against GNU grep, but BSD sed so be wary.
该sed脚本应该去掉括号。这是针对 GNU grep 测试的,但是 BSD sed 所以要小心。
回答by Mark O'Connor
To reproduce your list:
要重现您的列表:
cat file.txt | sed 's/.*(\(.*\)).*//'
To compile a list of unique words, you need to process the list further:
要编译唯一单词列表,您需要进一步处理列表:
cat file.txt | sed 's/.*(\(.*\)).*//' | sort | uniq
回答by Venkat Madhav
You can try this
你可以试试这个
sed -e 's/\(/\n\(/g' -e 's/\)/\n/g' filename|awk -F'(' '{print }'|sort -u
Explaination:
说明:
The 1st sed statement places the words in parenthesis in new line and the second sed replaces the character ')' with new line. So after running the below statement
第一个 sed 语句将括号中的单词放在新行中,第二个 sed 将字符 ')' 替换为新行。所以在运行下面的语句后
sed -e 's/\(/\n\(/g' -e 's/\)/\n/g' filename
the output would look like this
输出看起来像这样
This is some
(text
.This
(text
has some
(words
in parenthesis.
Sometimes, there are numbers, such as
(123
in parenthesis too.
Now piping this output to below awk statement which prints the second word between the filter character '('
现在将此输出传送到 awk 语句下方,该语句打印过滤器字符 '(' 之间的第二个单词
awk -F'(' '{print }'
the output now will be
现在的输出将是
text
text
words
123
the above output is piped to sort -u command to give unique words from the above output. Hope this explanation helps.
上面的输出通过管道传送到 sort -u 命令,以从上面的输出中给出唯一的词。希望这个解释有帮助。

