Linux 在数据文件中查找唯一值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6951223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
finding unique values in a data file
提问by Illusionist
I can do this in python but I was wondering if I could do this in Linux
我可以在 python 中做到这一点,但我想知道我是否可以在 Linux 中做到这一点
I have a file like this
我有一个这样的文件
name1 text text 123432re text
name2 text text 12344qp text
name3 text text 134234ts text
I want to find all the different types of values in the 3rd column by a particular username lets say name 1.
我想通过特定的用户名在第 3 列中找到所有不同类型的值,比如名称 1。
grep name1 filename gives me all the lines, but there must be some way to just list all the different type of values? (I don't want to display duplicate values for the same username)
grep name1 filename 给了我所有的行,但必须有某种方法来列出所有不同类型的值?(我不想显示相同用户名的重复值)
采纳答案by Mike Mertsock
grep name1 filename | cut -d ' ' -f 4 | sort -u
This will find all lines that have name1, then get just the fourth column of data and show only unique values.
这将找到所有具有 name1 的行,然后仅获取第四列数据并仅显示唯一值。
回答by Micha? ?rajer
You can let sort look only on 4-th key, and then ask only for records with unique keys:
您可以让 sort 仅查看第 4 个键,然后仅询问具有唯一键的记录:
grep name1 | sort -k4 -u
回答by glenn Hymanman
As an all-in-one awk solution:
作为多合一的 awk 解决方案:
awk ' == "name1" && ! seen[" "]++ {print }' filename
回答by Rohan Khude
I tried using cat
我尝试使用 cat
File contains :(here file is foo.sh you can input any file name here)
文件包含:(这里的文件是 foo.sh 你可以在这里输入任何文件名)
$cat foo.sh
$cat foo.sh
tar
world
class
zip
zip
zip
python
jin
jin
doo
doo
uniq
will get each word only once
uniq
每个单词只会得到一次
$ cat foo.sh | sort | uniq
$ cat foo.sh | sort | uniq
class
doo
jin
python
tar
world
zip
uniq -u
will get the word appeared only one time in file
uniq -u
将让这个词在文件中只出现一次
$ cat foo.sh | sort | uniq -u
$ cat foo.sh | sort | uniq -u
class
python
tar
world
uniq -d
will get the only the duplicate words and print them only once
uniq -d
将获得唯一的重复单词并只打印一次
$ cat foo.sh | sort | uniq -d
$ cat foo.sh | sort | uniq -d
doo
jin
zip
回答by Mansur Ali
In my opinion, you need to select the field from which you need the unique values. I was trying to retrieve unique source IPs from IPTables log.
在我看来,您需要选择需要唯一值的字段。我试图从 IPTables 日志中检索唯一的源 IP。
cat /var/log/iptables.log | grep "May 5" | awk '{print }' | sort -u
Here is the output of the above command:
以下是上述命令的输出:
SRC=192.168.10.225
SRC=192.168.10.29
SRC=192.168.20.125
SRC=192.168.20.147
SRC=192.168.20.155
SRC=192.168.20.183
SRC=192.168.20.194
So, the best idea is to select the field first and then filter out the unique data.
所以,最好的办法是先选择字段,然后过滤掉唯一的数据。
回答by Sobhit Sharma
The following command worked for me.
以下命令对我有用。
sudo cat AirtelFeb.txt | awk '{print }' | sort -u
Here it prints the 3rd column with unique values.
在这里它打印具有唯一值的第三列。
回答by Ivan
IMHO Micha? ?rajer got the best answer but a filename needed after grep name1And i've got this fancy solution using indexed array
恕我直言,米查?? rajer 得到了最好的答案,但是在grep name1之后需要一个文件名 而且我使用索引数组得到了这个奇特的解决方案
user=name1
IFSOLD=$IFS; IFS=$'\n'; test=( $(grep $user test) ); IFS=$IFSOLD
declare -A index
for item in "${test[@]}"; {
sub=( $item )
name=${sub[3]}
index[$name]=$item
}
for item in "${index[@]}"; { echo $item; }