BASH - 计算文件中相似行的数量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8627014/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
BASH - count number of similar lines in a file
提问by user219882
I have a topic in a forum where people can write their Top 10 List of songs. I want to count how many times a song is listed. The similarity has to be compared case insensitive.
我在一个论坛上有一个主题,人们可以在那里写出他们的 Top 10 歌曲列表。我想计算一首歌被列出的次数。必须比较相似性,不区分大小写。
Example of the file structure:
文件结构示例:
Join Date: Apr 2005
Location: bama via new orleans
Age: 48
Posts: 2,369
Re: Top 10 Songs Jethro Tull
oh dearrrr. the only way for all kaths to keep their last shred of sanity: fly through this list as quickly as possible, without stopping to think for a microsecond...
velvet green
dun ringill
skating away on the thin ice of a new day
sossity yer a woman
fat man
life's a long song
Hyman-a-lynn
teacher
mother goose
elegy
03-10-2010, 02:29 AM #5 (permalink)
Sox
Avoiding The Swan Song
Join Date: Jan 2010
Location: Derbyshire, England
Age: 43
Posts: 5,991
Re: Top 10 Songs Jethro Tull
Wow !!!! Where do I start ?
Dun Ringill
Aqualung
With You There To Help Me
Hyman Frost And The Hooded Crow
We Used To Know
Witch's Promise
Pussy Willow
Heavy Horses
My Sunday Feeling
Locomotive Breath
Join Date: Nov 2009
Posts: 1,418
Re: Top 10 Songs Jethro Tull
Too bad they all can't make the list, but here's ten I never get tired of listening to:
Christmas Song
Witches Promise
Life's A Long Song
Living In The Past
Rainbow Blues
Sweet Dream
Minstral In The Gallery
Cup of Wonder
Rover
Something's On the Move
Example output:
示例输出:
life's a long song 3
aqualung 1
...
回答by Mat
You're file's "structure" is a bit lacking in the structure department, so you'll have to deal with some errors in the process.
你的文件的“结构”在结构部门有点欠缺,所以你必须在这个过程中处理一些错误。
Assuming you have all that in a file called input, try:
假设您在名为 的文件中拥有所有这些input,请尝试:
tr '[A-Z]' '[a-z]' < input | \
egrep -v "^ *(join date|age|posts|location|re):" | \
sort | \
uniq -c
First line lowercases everything, second strips out the things that look like email headers in your sample, then sort and count unique items.
第一行小写所有内容,第二行去掉样本中看起来像电子邮件标题的内容,然后对唯一项目进行排序和计数。
回答by Jhonathan
This command lists the lines and the number of times to repeat
此命令列出行数和重复次数
sort nameFile | uniq -c
回答by jaypal singh
How about using awkfor this -
如何使用awk这个 -
awk '
/:/||/^$/{next}{a[toupper(awk '
/:/||/^$/{next}{a[toupper(##代码##)]++}
END{for(i in a) print i,a[i]}' file1
SOX 1
CHRISTMAS SONG 1
CUP OF WONDER 1
SOSSITY YER A WOMAN 1
FAT MAN 1
PUSSY WILLOW 1
VELVET GREEN 1
WITH YOU THERE TO HELP ME 1
ELEGY 1
WE USED TO KNOW 1
TEACHER 1
MY SUNDAY FEELING 1
SWEET DREAM 1
Hyman-A-LYNN 1
SOMETHING'S ON THE MOVE 1
ROVER 1
DUN RINGILL 2
AVOIDING THE SWAN SONG 1
Hyman FROST AND THE HOODED CROW 1
WITCHES PROMISE 1
LIFE'S A LONG SONG 2
LIVING IN THE PAST 1
WITCH'S PROMISE 1
WOW !!!! WHERE DO I START ? 1
SKATING AWAY ON THE THIN ICE OF A NEW DAY 1
MINSTRAL IN THE GALLERY 1
RAINBOW BLUES 1
MOTHER GOOSE 1
HEAVY HORSES 1
AQUALUNG 1
LOCOMOTIVE BREATH 1
)]++}
END{for(i in a) print i,a[i]}' INPUT_FILE
Explanation:
解释:
First we identify lines that has
:in them or areblankand ignore them. All other lines gets stored are converted to upper case and stored in an array. In ourEND statementwe print out everything in our array and the number of times it was found.
首先,我们确定
:其中包含或存在的行blank并忽略它们。存储的所有其他行都转换为大写并存储在数组中。在我们的中,END statement我们打印出数组中的所有内容以及找到它的次数。

