bash 如何从文件 linux 中找到唯一的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29182502/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to find unique words from file linux
提问by jan345
i have a big file, teh lines look like this Text numbers etc. [Man-(some numers)] is lot of this Man-somenumbers is repeat in few lines, i want to count only unique Mans -words. I cant use unique file , because text before Man words is always different in each line. How can i count only unique Man-somenumbers words in file ?
我有一个大文件,这些行看起来像这样的文本数字等。 [Man-(some numers)] 很多这个 Man-somenumbers 在几行中重复,我只想计算唯一的 Mans -words。我不能使用 unique file ,因为 Man words 之前的文本在每一行中总是不同的。我如何才能只计算文件中唯一的 Man-somenumbers 单词?
回答by Wintermute
If I understand what you want to do correctly, then
如果我理解你想要正确做的事情,那么
grep -oE 'Man-[0-9]+' filename | sort | uniq -c
should do the trick. It works as follows: First
应该做的伎俩。它的工作原理如下:首先
grep -oE 'Man-[0-9]+' filename
isolates all words from the file that match the Man-[0-9]+
regular expression. That list is then piped through sort
to get the sorted list that uniq
requires, and then that sorted list is piped through uniq -c
to count how often each unique Man-
word appears.
从文件中分离出与Man-[0-9]+
正则表达式匹配的所有单词。然后通过管道传输该列表sort
以获取所需的排序列表,uniq
然后通过管道传输该排序列表uniq -c
以计算每个唯一Man-
单词出现的频率。